ICLR 2024

Flagship venue for representation learning. Single round, fully transparent review cycle.

OpenReview

7,404

Visible submissions

Total exposed on OpenReview

39.1%

Acceptance rate

3,889

Critiques analysed

Across 300 rejected papers

Patterns identified

Rating distribution

Scores reviewers assigned in their reviews.

Committee decisions

How final decisions are distributed across visible submissions.

Reject60.9%
Accept (poster)31.3%
Accept (spotlight)6.3%
Accept (oral)1.5%

Weakness map

Each bar is a recurrent pattern across the reviews of rejected papers. The width represents how much weight that pattern carries among all analysed critiques.

Unclear definitions and algorithms25.3%860
Insufficient datasets and training setup19.7%672
Unclear core contribution12.1%411
Baseline comparisons missing or weak9.1%309
Positioning against existing work7.3%248
Shallow experimental results7.4%253
Poorly structured related work6.5%221
Errors in sections and internal references6.3%215
Confusing figures and captions3.8%130
Real-world applicability questioned2.5%84

Patterns, one by one

Sorted by weight. For each pattern we show what it represents, how reviewers phrase it, and a practical takeaway you can apply before submitting your next paper.

#01

Unclear definitions and algorithms

22.1%of total860 items

Reviewers cannot reconstruct what the paper proposes from its formal description: incomplete notation, unstated assumptions, equations that do not match the text.

authorsequationtheoremalgorithmeqcleardefinedSection

How reviewers phrase it

One of the biggest weaknesses of the paper is that it does not properly place its results in the context of the existing literature. Once the indicator function w.r.t. the threshold has been defined the proof relies on existing techniques (...).

It is unclear from the description in Section 3 whether the network outputs are normalised to a probability simplex or treated as logits. The same symbol denotes both objects in different equations.

The proof of Theorem 2 invokes Lemma 1 with a slightly different set of assumptions; the authors should explicitly state which version they are using.

Practical takeaway. Before submitting, ask someone outside the project to read Section 3 only and reconstruct the algorithm. If they can't, rewrite.

#02

Insufficient datasets and training setup

17.3%of total672 items

Modelling and training decisions are under-justified: only one or two datasets are evaluated, hyperparameters are absent, the choice of base model is not discussed.

modeldatamodelsdatasettrainingperformancedifferentdatasets

How reviewers phrase it

Despite the promising idea, the authors do not convincingly and clearly show the value of the proposed metrics and the goal of the uncertainty analysis. First, the meaning of the metrics is not explained and must be found in the literature.

All experiments use the same backbone with default hyperparameters. The paper would be much stronger with a sensitivity study showing the method is not brittle to specific settings.

Only two datasets are evaluated; both are saturated benchmarks. A third, harder dataset would strengthen the case considerably.

Practical takeaway. Cover at least three diverse datasets. Report full hyperparameters in the appendix. Explicitly justify the choice of base model.

#03

Unclear core contribution

10.6%of total411 items

The paper is interesting but the reviewer cannot articulate what problem it solves that wasn't already solved, or what assumption it challenges.

papermaincontributionlacksdoesauthorsprovideinteresting

How reviewers phrase it

Despite the above strengths, this paper also has some drawbacks: I'm not sure I see a clear, single-sentence answer to what the main contribution is over the work cited as [4].

The paper has many interesting ideas, but the central claim is hidden across multiple sections. Pulling it into a single named contribution at the end of the introduction would help.

It is not easy to extract from the discussion which assumption of the prior work the authors are challenging.

Practical takeaway. Apply the one-paragraph test: can you describe your contribution in a single sentence without using the word `novel`? If not, it isn't clear.

#04

Baseline comparisons missing or weak

8%of total309 items

Baselines are too few or not the strongest available. The paper fails to compare against the most relevant family of methods.

methodproposedproposed methodpaperbaselinesperformanceauthorsdoes

How reviewers phrase it

It is not easy to see in principle how the proposed method is superior to the probabilistic method.

The authors compare against three relatively old baselines. Several stronger 2023 methods are missing entirely.

Why was the comparison restricted to the single-modal setting? The natural baseline in this sub-area is multi-modal.

Practical takeaway. List the three strongest baselines in your sub-area before drafting the results table. If your method doesn't beat them, say so explicitly and argue why your approach is better on another axis.

#05

Positioning against existing work

6.4%of total248 items

The paper is framed as a `unified framework` or `general approach`, but reviewers see that it only covers a specific sub-area. The related work section is incomplete.

methodsexistingcomparisonexisting methodsmethodbaselinesauthorspaper

How reviewers phrase it

Though the authors claim that they aim to propose a unified framework, the methods considered in their paper are mainly based on AM and POMO, in other words, the auto-regressive methods. As far as I know, there are also other methods.

The paper claims to apply broadly but only validates on one task family. Either evaluate on a second family or restrict the framing.

Several recent works on this exact problem are not cited; the contribution would look different in their light.

Practical takeaway. If you claim generality, prove it: include at least one experiment outside the original niche. Otherwise, soften the language.

#06

Shallow experimental results

6.5%of total253 items

Tables exist but the analysis is thin: no error bars, no significance tests, no visualisations that tell the story.

resultsexperimentalexperimental resultspaperauthorsexperimentsempiricaltable

How reviewers phrase it

Visualization results are encouraged to be included.

Table 2 reports point estimates only. Without error bars it is impossible to tell whether the differences are significant.

The empirical evaluation is described in two paragraphs but the underlying methodology — number of seeds, splits, evaluation protocol — is not clear.

Practical takeaway. Pair every table with at least one figure that illustrates the key difference. Report confidence intervals, not just means.

#07

Poorly structured related work

5.7%of total221 items

Section 2 lists prior work but does not compare entries against each other or relate them to the paper's contribution. A `wall of citations`.

workrelatedrelated workworksrelated worksnoveltyauthorsprevious

How reviewers phrase it

The related work section needs reorganization. The current version simply decomposes the framework into related parts and introduces the works one by one. It's hard to grasp the major contribution.

Many cited works are summarised but not contrasted. Add a one-sentence comparison per cluster of related work.

The novelty argument depends on a comparison the section does not make explicit.

Practical takeaway. Structure related work by dimensions (not by papers). Each subsection should end with `leaving a gap we close with X`.

#08

Errors in sections and internal references

5.5%of total215 items

Small errors that break the read: equations with swapped indices, internal references to wrong sections, undefined acronyms.

sectionauthorsusedparagraphpapersection authorsclarityexperiments

How reviewers phrase it

Section 4.2. r^{i}|wrap(I^{i-1}, m^{i->i-1})-I^{i}| should be r^{i}|wrap(I^{i-1}, m^{i-1>i})-I^{i}| ?

many typos, e.g. section 5, `We also e introduce`, what is the e for?

Eq. (8) references Lemma 3 but the lemma is numbered 4 in the appendix.

Practical takeaway. A proof-reading pass dedicated to `consistency` (not `grammar`) catches most of these. Do it in the final days before the deadline, not while you are still writing.

#09

Confusing figures and captions

3.3%of total130 items

Too many figures, weak captions, missing units, legends unreadable at print size. The visual narrative takes more effort than it should.

figurefigurescaptionresultsfigure figureminortextunclear

How reviewers phrase it

I recommend the author combine Figure 3 and Figure 4 into one line such that a lot of space can be saved.

The legend in Fig. 5 is unreadable at the printed size; use thicker lines and contrasting markers.

Captions assume the reader has the body text in front of them. Make them self-contained.

Practical takeaway. Every figure should stand on its own. Self-contained caption. If Fig. 3 and Fig. 4 say the same thing, merge them.

#10

Real-world applicability questioned

2.2%of total84 items

The method is evaluated on clean synthetic settings. Reviewers ask how it behaves under noisy data, distribution shift, or a real system beyond the pipeline.

realreal worldworldapplicationsscenariosworld applicationsworld scenariosdata

How reviewers phrase it

If the current setting is realistic, I suggest showing the effectiveness of attacking the real-world system.

Synthetic results are convincing; a single real-world deployment would change the contribution from theoretical to practical.

How does the method degrade under realistic noise levels? Reporting only the clean setting limits the impact.

Practical takeaway. If your method is production-viable, prove it with a dirty-data experiment. Otherwise, honestly bound the scope as fundamental research.

Other venues

ICLR 2025

11,672 submissions · 10 clusters

→

NeurIPS 2024

4,236 submissions · 10 clusters

→

TMLR

6,661 submissions · 10 clusters

→