Transactions on Machine Learning Research. Journal-style continuous review with no deadline. Different incentives for authors and reviewers: does it translate into different rejection patterns?
TMLR has no fixed cycle: the data covers all visible submissions at compute time, predominantly 2023-2024. It also doesn't use numeric ratings (recommendations are categorical: Accept as is / minor revision / Reject), which is why this page does not show a rating histogram.
6,661
Visible submissions
Total exposed on OpenReview
71.8%
Acceptance rate
3,001
Critiques analysed
Across 300 rejected papers
10
Patterns identified
Rating distribution
Scores reviewers assigned in their reviews.
Committee decisions
How final decisions are distributed across visible submissions.
Accept as is42.8%
Accept with minor revision29%
Reject28.2%
Weakness map
Each bar is a recurrent pattern across the reviews of rejected papers. The width represents how much weight that pattern carries among all analysed critiques.
Experimental results and datasets27.9%752
Insufficient mathematical justification15.3%412
Questionable problem formulation11.2%303
Vagueness and imprecise writing10.8%291
Limitations of the proposed method8.3%223
Missing comparisons with recent methods7%189
Unbalanced section structure6.5%176
Limited discussion of prior work4.9%132
Algorithm not reader-ready4.8%129
Figures and axes3.4%93
Patterns, one by one
Sorted by weight. For each pattern we show what it represents, how reviewers phrase it, and a practical takeaway you can apply before submitting your next paper.
#01
Experimental results and datasets
25.1%of total752 items
The largest cluster collects experimental-section problems: missing datasets, weak comparisons, results that don't convince reviewers of the impact. TMLR explicitly weighs empirical rigour as a top acceptance criterion.
The empirical evaluation is limited to a single dataset; conclusions about generalisation are not warranted.
Tables show point estimates only. Without confidence intervals, the gains over the baselines could be noise.
Several recent baselines that solve a similar problem are absent from the comparison.
Practical takeaway. TMLR does not reward elegant but poorly validated ideas. If your paper hinges on a theoretical advance, validate it on at least two recognised benchmarks before submitting.
#02
Insufficient mathematical justification
13.7%of total412 items
Proofs with logical jumps, unstated assumptions, equations that conflate two different concepts. TMLR reviewers are particularly strict on formal soundness.
equationeqtheoremproofdoesfunctiondefinedlemma
How reviewers phrase it
**The geometry-aware distance is not well-justified**. Although it seems novel, a new variant of Cramer distance in Section 10.1 is engineered with a sloppy and heuristic explanation.
The proof of Lemma 2 uses a different definition of the loss than Section 3; the reader has to reconcile them.
The bound in Theorem 1 hides a Lipschitz constant that is never bounded.
Practical takeaway. If your paper is theoretical, dedicate a full appendix to detailed proofs. In the body, keep the sketch but explicitly reference each step of the appendix.
#03
Questionable problem formulation
10.1%of total303 items
The reviewer challenges the paper's setting: training assumptions don't hold, the method only works in unrealistic regimes, the research question is slightly artificial.
The central issue lies in a misformulated problem setting. The central setting — improving video understanding via token compression on a training-free, image-only VLM — has limited practical impact.
The training-time assumptions don't hold once the model is deployed; the gap is not discussed.
The setting requires access to ground-truth at inference time, which defeats the purpose of the method.
Practical takeaway. Before writing, run an honest test on your setting: would anyone outside your sub-area pay for the advance you promise? If not, adjust the motivation.
#04
Vagueness and imprecise writing
9.7%of total291 items
Ambiguous sentences that block evaluation. Reviewers cannot say whether something is correct because they cannot tell exactly what the author is claiming.
This (and many other instances of) vagueness prevents me from evaluating the correctness of the paper.
The phrase `the model handles this` appears in three different sections, each with a different referent for `this`.
The contribution claims are stated in active voice but the experiments are reported impersonally; it is hard to attribute results.
Practical takeaway. If a sentence has two interpretations, the reviewer will use the one against you. Rewrite with explicit subject, verb, and object. No `it`, no `this`, no `the model`.
#05
Limitations of the proposed method
7.4%of total223 items
The method works, but only under restrictive conditions: it requires specific pre-trained models, pre-aligned data, strong distributional assumptions. The applicability is narrow.
The proposed method still relies on pretrained CMs. It somehow restricts the applicability.
Performance only holds when the input distribution matches the training one; out-of-distribution behaviour is not characterised.
The method assumes access to a calibration set whose size is not discussed.
Practical takeaway. A `prerequisites` panel in the introduction is more honest than hiding the limitations. TMLR penalises overpromising more than honesty.
#06
Missing comparisons with recent methods
6.3%of total189 items
TMLR reviewers know the area and quickly notice recent methods missing from the comparison. Particularly works published during the review cycle.
The methodologies in Sections 4 and 5 are not deeply discussed. Methods are listed without comparison.
Several methods published in 2024 in the same sub-area are absent.
The comparison is restricted to pre-2022 baselines; the recent ones likely change the conclusions.
Practical takeaway. Before submitting, scan Twitter/Bluesky/arXiv for the last 6 months in your niche. If recent papers exist and you don't cite them, the reviewer will cite them for you — and not in a friendly way.
#07
Unbalanced section structure
5.9%of total176 items
Sections that are too long (five-page introductions, half-page paragraphs) that dilute the argument. TMLR allows additional pages but reviewers value concision.
I find some parts slightly too long, e.g. the 5 and 6 paragraphs of the intro, Section 5 etc.
The methodology spreads across three sections that could be one.
Paragraph 4 of Section 3 repeats material from Paragraph 2.
Practical takeaway. Each introduction paragraph should answer a single question (`what problem`, `why it matters`, `what we do`, `what we measure`, `what we find`). If two paragraphs answer the same one, merge them.
#08
Limited discussion of prior work
4.4%of total132 items
Same pattern as in other venues, but TMLR reviewers are specifically strict about categorisation: they ask for grouping prior work by approach, not by chronology.
Limited discussion of related work: While related work is mentioned, more details on the similarities and differences to previous work and other categorical approaches would help.
The related-work section enumerates papers; we'd benefit from a table comparing them on shared axes.
Prior work in the same area is grouped chronologically; group it by methodology instead.
Practical takeaway. A two-axis taxonomy (`type of approach` × `type of problem`) in related work, ideally as a table, closes this complaint.
#09
Algorithm not reader-ready
4.3%of total129 items
The algorithm is presented without the definitions it needs (variables appearing without being introduced, indices reused). Small problem with big impact when readers go deep.
Can you introduce C, J and E before the algorithm 1?
The algorithm uses subscripts t and i interchangeably.
Step 4 references a function f_θ that has only been described informally in the previous section.
Practical takeaway. Your algorithm should be readable without going back to the body. If you need to look up a variable, move the definition right before the algorithm block.
#10
Figures and axes
3.1%of total93 items
Small but consistent cluster: overlapping figures, axes without appropriate scaling, missing captions. The complaint is direct and usually has a mechanical fix.
figurefigurestextdoesaxispageoverlaplabels
How reviewers phrase it
Figure 2 and 3 overlap a lot, why not just combine them into a single figure?
The y-axis of Fig. 4 uses a linear scale; a log scale would make the differences readable.
Caption of Fig. 5 is two words; please add what the reader is looking at.
Practical takeaway. Before the final version: every figure should be one zoom-step from its final form, with axes labelled, legend in place, and a caption that has a subject and a verb.