How final decisions are distributed across visible submissions.
Reject60.9%
Accept (poster)31.3%
Accept (spotlight)6.3%
Accept (oral)1.5%
Weakness map
Each bar is a recurrent pattern across the reviews of rejected papers. The width represents how much weight that pattern carries among all analysed critiques.
Unclear definitions and algorithms25.3%860
Insufficient datasets and training setup19.7%672
Unclear core contribution12.1%411
Baseline comparisons missing or weak9.1%309
Positioning against existing work7.3%248
Shallow experimental results7.4%253
Poorly structured related work6.5%221
Errors in sections and internal references6.3%215
Confusing figures and captions3.8%130
Real-world applicability questioned2.5%84
Patterns, one by one
Sorted by weight. For each pattern we show what it represents, how reviewers phrase it, and a practical takeaway you can apply before submitting your next paper.
#01
Unclear definitions and algorithms
22.1%of total860 items
Reviewers cannot reconstruct what the paper proposes from its formal description: incomplete notation, unstated assumptions, equations that do not match the text.
One of the biggest weaknesses of the paper is that it does not properly place its results in the context of the existing literature. Once the indicator function w.r.t. the threshold has been defined the proof relies on existing techniques (...).
It is unclear from the description in Section 3 whether the network outputs are normalised to a probability simplex or treated as logits. The same symbol denotes both objects in different equations.
The proof of Theorem 2 invokes Lemma 1 with a slightly different set of assumptions; the authors should explicitly state which version they are using.
Practical takeaway. Before submitting, ask someone outside the project to read Section 3 only and reconstruct the algorithm. If they can't, rewrite.
#02
Insufficient datasets and training setup
17.3%of total672 items
Modelling and training decisions are under-justified: only one or two datasets are evaluated, hyperparameters are absent, the choice of base model is not discussed.
Despite the promising idea, the authors do not convincingly and clearly show the value of the proposed metrics and the goal of the uncertainty analysis. First, the meaning of the metrics is not explained and must be found in the literature.
All experiments use the same backbone with default hyperparameters. The paper would be much stronger with a sensitivity study showing the method is not brittle to specific settings.
Only two datasets are evaluated; both are saturated benchmarks. A third, harder dataset would strengthen the case considerably.
Practical takeaway. Cover at least three diverse datasets. Report full hyperparameters in the appendix. Explicitly justify the choice of base model.
#03
Unclear core contribution
10.6%of total411 items
The paper is interesting but the reviewer cannot articulate what problem it solves that wasn't already solved, or what assumption it challenges.
Despite the above strengths, this paper also has some drawbacks: I'm not sure I see a clear, single-sentence answer to what the main contribution is over the work cited as [4].
The paper has many interesting ideas, but the central claim is hidden across multiple sections. Pulling it into a single named contribution at the end of the introduction would help.
It is not easy to extract from the discussion which assumption of the prior work the authors are challenging.
Practical takeaway. Apply the one-paragraph test: can you describe your contribution in a single sentence without using the word `novel`? If not, it isn't clear.
#04
Baseline comparisons missing or weak
8%of total309 items
Baselines are too few or not the strongest available. The paper fails to compare against the most relevant family of methods.
It is not easy to see in principle how the proposed method is superior to the probabilistic method.
The authors compare against three relatively old baselines. Several stronger 2023 methods are missing entirely.
Why was the comparison restricted to the single-modal setting? The natural baseline in this sub-area is multi-modal.
Practical takeaway. List the three strongest baselines in your sub-area before drafting the results table. If your method doesn't beat them, say so explicitly and argue why your approach is better on another axis.
#05
Positioning against existing work
6.4%of total248 items
The paper is framed as a `unified framework` or `general approach`, but reviewers see that it only covers a specific sub-area. The related work section is incomplete.
Though the authors claim that they aim to propose a unified framework, the methods considered in their paper are mainly based on AM and POMO, in other words, the auto-regressive methods. As far as I know, there are also other methods.
The paper claims to apply broadly but only validates on one task family. Either evaluate on a second family or restrict the framing.
Several recent works on this exact problem are not cited; the contribution would look different in their light.
Practical takeaway. If you claim generality, prove it: include at least one experiment outside the original niche. Otherwise, soften the language.
#06
Shallow experimental results
6.5%of total253 items
Tables exist but the analysis is thin: no error bars, no significance tests, no visualisations that tell the story.
The related work section needs reorganization. The current version simply decomposes the framework into related parts and introduces the works one by one. It's hard to grasp the major contribution.
Many cited works are summarised but not contrasted. Add a one-sentence comparison per cluster of related work.
The novelty argument depends on a comparison the section does not make explicit.
Practical takeaway. Structure related work by dimensions (not by papers). Each subsection should end with `leaving a gap we close with X`.
#08
Errors in sections and internal references
5.5%of total215 items
Small errors that break the read: equations with swapped indices, internal references to wrong sections, undefined acronyms.
Section 4.2. r^{i}|wrap(I^{i-1}, m^{i->i-1})-I^{i}| should be r^{i}|wrap(I^{i-1}, m^{i-1>i})-I^{i}| ?
many typos, e.g. section 5, `We also e introduce`, what is the e for?
Eq. (8) references Lemma 3 but the lemma is numbered 4 in the appendix.
Practical takeaway. A proof-reading pass dedicated to `consistency` (not `grammar`) catches most of these. Do it in the final days before the deadline, not while you are still writing.
#09
Confusing figures and captions
3.3%of total130 items
Too many figures, weak captions, missing units, legends unreadable at print size. The visual narrative takes more effort than it should.
I recommend the author combine Figure 3 and Figure 4 into one line such that a lot of space can be saved.
The legend in Fig. 5 is unreadable at the printed size; use thicker lines and contrasting markers.
Captions assume the reader has the body text in front of them. Make them self-contained.
Practical takeaway. Every figure should stand on its own. Self-contained caption. If Fig. 3 and Fig. 4 say the same thing, merge them.
#10
Real-world applicability questioned
2.2%of total84 items
The method is evaluated on clean synthetic settings. Reviewers ask how it behaves under noisy data, distribution shift, or a real system beyond the pipeline.
If the current setting is realistic, I suggest showing the effectiveness of attacking the real-world system.
Synthetic results are convincing; a single real-world deployment would change the contribution from theoretical to practical.
How does the method degrade under realistic noise levels? Reporting only the clean setting limits the impact.
Practical takeaway. If your method is production-viable, prove it with a dirty-data experiment. Otherwise, honestly bound the scope as fundamental research.