ICLR 2025

ICLR 2025 edition. Same format as 2024, an even larger dataset: the API exposes 11,672 submissions and 5,019 rejections. Lets us see how reviewer concerns evolve year over year.

OpenReview

11,672

Visible submissions

Total exposed on OpenReview

42.5%

Acceptance rate

4,456

Critiques analysed

Across 300 rejected papers

Patterns identified

Rating distribution

Scores reviewers assigned in their reviews.

Committee decisions

How final decisions are distributed across visible submissions.

Reject57.5%
Accept (Poster)35.7%
Accept (Spotlight)4.4%
Accept (Oral)2.4%

Weakness map

Each bar is a recurrent pattern across the reviews of rejected papers. The width represents how much weight that pattern carries among all analysed critiques.

Loose definitions and evaluation30.2%1,235
Algorithms and equations unclear18.8%767
Central contribution under-articulated12.2%500
Incremental method9.1%372
Poorly presented experimental results6.9%284
Positioning and writing5.5%226
Reliance on synthetic or expensive data5.1%209
Computational cost ignored4.8%196
Typos3.7%152
Inconsistent figures3.6%148

Patterns, one by one

Sorted by weight. For each pattern we show what it represents, how reviewers phrase it, and a practical takeaway you can apply before submitting your next paper.

#01

Loose definitions and evaluation

27.7%of total1,235 items

Large and heterogeneous cluster. Mixes complaints about concepts that are not crisply defined in the paper (what exactly is a `scene`, a `task`, an `environment`) with doubts about whether the evaluation covers what was promised.

modelsmodelmethodsperformanceevaluationbaseddatascene

How reviewers phrase it

**Unclear Definition and Scope of "Scene"**: The paper does not clearly define what constitutes a `scene` in the context of user behavior modeling. While scenes are described as features, the boundary between scenes is fuzzy.

The evaluation does not match the claims of the introduction. The introduction promises generalization across domains; the experiments only test in-domain.

Several modeling choices (loss weights, normalisation, augmentation) are described in passing without justification.

Practical takeaway. Include an operative-definitions paragraph in the introduction: each key term in one sentence. If you can't, refine the idea before writing.

#02

Algorithms and equations unclear

17.2%of total767 items

Reviewers grasp the high-level idea but can't reconstruct what the paper does concretely: missing equations, algorithms with skipped steps, insufficient technical depth.

authorsalgorithmeqequationuncleardoesdefinedstep

How reviewers phrase it

Unfortunately, the paper is lacking in technical depth. The idea and its implementation seem straightforward to me, and the results are unsurprising — so I find the mathematical content insufficient.

Algorithm 1 omits the update rule for the auxiliary buffer. Without it the reader cannot reproduce the experiments.

Eq. (5) introduces a notation that is reused with a different meaning in Eq. (12).

Practical takeaway. If your method is reproducible, prove it: pseudocode in the body (not just the appendix), and every referenced hyperparameter with its default value.

#03

Central contribution under-articulated

11.2%of total500 items

The paper does not articulate a single contribution the reader can take away in one sentence. Complaints like `the main contribution is not clear` appear, or the reviewer admits the topic falls outside their area because they cannot even situate it.

paperdoesweaknessespaper doesmaincontributioninterestingmake

How reviewers phrase it

The main contribution of this paper is unclear to me. There seem to be three independent ideas, none of which is fully developed.

I find the paper interesting but I'm not sure what the central claim is.

The introduction lists four contributions, but Section 3 develops only one of them.

Practical takeaway. The fridge test: if you stuck your paper to the fridge for a week, would you remember it by one specific idea or by `something about embeddings`? If the latter, refine the lead.

#04

Incremental method

8.4%of total372 items

The proposed method is reasonable but reviewers see a combination of existing parts rather than a distinguishing idea. The words `straightforward`, `incremental`, or `simple combination` appear.

methodproposedproposed methodmethodspaperauthorsnoveltysimple

How reviewers phrase it

**The proposed method**. Although the authors give a good and novel assumption, the proposed methods seems simple and incremental without insightful design.

The method is essentially A + B with minor adjustments. The paper would benefit from arguing why the combination is non-trivial.

Conceptually the contribution feels small; the engineering work is substantial but the reader is left wondering about the take-away idea.

Practical takeaway. If your method is genuinely novel, dedicate a paragraph to explaining why the combination is non-trivial. If it is incremental, frame it as engineering and validation and quantify the trade-offs.

#05

Poorly presented experimental results

6.4%of total284 items

Tables are dense and unreadable, results are reported without separation between key findings and supporting detail, error metrics or comparative context are missing.

resultsexperimentalexperimental resultspaperauthorsexperimentstablenumbers

How reviewers phrase it

It is hard to interpret the experimental results as the tables are full of numbers. The authors may wish to present the results for one length and relegate the rest to the appendix.

Table 3 lists eight columns of results without confidence intervals; it is impossible to tell which differences are significant.

The headline number is buried in the middle of a multi-page table.

Practical takeaway. One main table with the key metric in the first column. Details belong in the appendix. The reviewer should not have to hunt for your headline result.

#06

Positioning and writing

5.1%of total226 items

Mix of writing issues and weak connection to related work. Some reviewers spot near-verbatim paraphrasing of earlier papers; others ask to reorganise the context section.

sectionwritingpaperrelatedrelated workworknoveltystructure

How reviewers phrase it

**Section 3.2** is almost a verbatim copy of Section 2.2 (EFO-1 Queries and Answers) from the FIT paper.

Related work reads as a list of citations rather than a comparison; reorganise by axis (data, model, evaluation).

The novelty argument depends on contrasts the related-work section never makes explicit.

Practical takeaway. If you rewrite an explanation from another paper, cite it and quote it. If your related-work section is a chronological list, turn it into a thematic comparison.

#07

Reliance on synthetic or expensive data

4.7%of total209 items

The method only works with simulation-generated, clean synthetic, or expensive datasets. Reviewers ask whether the proposal survives on real-world data.

datatrainingdistributionmodelsyntheticsynthetic datarealavailable

How reviewers phrase it

Relies on large eddy simulation (LES) data for optimal results, which, while cheaper than DNS data, still adds to the data requirements and may not always be available.

All experiments use synthetically generated distribution shifts. The robustness to natural shifts is not tested.

Training requires paired data that is not realistic to collect at scale.

Practical takeaway. A dedicated `data requirements` section: how much, of what quality, at what cost. If your method only works with perfect data, declare the limitation clearly.

#08

Computational cost ignored

4.4%of total196 items

The method introduces expensive operations (per-layer SVD, second-order, quadratic attention) and reviewers flag that the cost is neither quantified nor compared to cheaper alternatives.

computationalcomplexitycosttimeefficiencycomputational complexityscalingtraining

How reviewers phrase it

The computational cost introduced by performing singular value decomposition on layer gradients has not been adequately assessed.

Training time and memory are not reported. For a method that adds a per-step optimisation, this is the key axis.

Scaling behaviour with model size is not discussed.

Practical takeaway. A `time per iteration` or `FLOPs` column in your main table. If you lose on that axis, explicitly argue why it's worth it.

#09

Typos

3.4%of total152 items

Detail errors: typos, undefined symbols, specific lines with the wrong character. They don't kill an otherwise solid paper, but they erode trust in the rest.

linetypodefinedtypo linemathbfmeanfixshould

How reviewers phrase it

Line 427 "integrogate" should be interrogate

$\mathbf{x}$ is used in Eq. (3) but never defined.

Algorithm 1, line 5: the index should be t-1, not t.

Practical takeaway. A typos-only pass in the last two days before the deadline. Boring work, very high return.

#10

Inconsistent figures

3.3%of total148 items

Colours that change between figures, missing legends, captions that aren't self-contained, axes without units. Small failures that break the paper's visual narrative.

figurefiguresexamplelegendexample figurecaptioncoloraxis

How reviewers phrase it

Color inconsistency in the plots of Figure 16.

Caption of Fig. 4 doesn't explain the legend.

The y-axis of Figure 6 has no units; readers have to look at the body text to interpret it.

Practical takeaway. A fixed colour palette for the whole paper, tied to entities (not to positions in a figure). Self-contained captions. Always units on the axes.

Other venues

ICLR 2024

7,404 submissions · 10 clusters

→

NeurIPS 2024

4,236 submissions · 10 clusters

→

TMLR

6,661 submissions · 10 clusters

→