OpenCódice
Blog
·Francisco-Javier Rodrigo-Ginés & Jorge Chamorro-Padial

Five MCP Servers for the Academic Stack: How We Built the OpenCódice Family

The technical detail of how we cover peer review, integrity, author identity, CS bibliography, and research artifacts, and why an agent with access to all five can reason about a paper as a unit of research.

mcpagentic-aischolarly-infrastructureopenreviewretraction-watchorciddblpzenodoopencodice

Two months ago we shipped openreview-mcp, the first MCP server that exposes peer review (OpenReview reviews, area-chair meta-reviews, final decisions) to a language model. Today we are releasing four more: retractionwatch-mcp, orcid-mcp, dblp-mcp, and zenodo-mcp. Together with the first, the family closes a loop: an LLM can now reason about a paper as a unit of research, rather than as a bundle of disconnected metadata.

This post explains how we built them, what we learned, what worked on the first try (not much), and what we had to rebuild (quite a lot).

The gap we were filling

Today there are public MCP servers for almost everything a researcher touches daily: arXiv, Semantic Scholar, Crossref, ACL Anthology, Hugging Face, GitHub. They all share an implicit hypothesis: an academic agent is a discovery agent. The question they answer is always the same: "find me something to read".

But a researcher does not only discover. They also verify, reproduce, cite carefully, evaluate co-authors, write rebuttals, deposit datasets. When an agent connected to Claude (or to the API, or to a custom client) tries to do any of those things, it runs out of tools. If the agent cites a retracted paper, it has no way to know. If two "J. Smith" entries are two different people, it has no way to know. If a paper has an accompanying Zenodo dataset under CC-BY, the agent only knows the paper exists and, with luck, its DOI.

The family of five servers we are finishing today fills that gap.

ServerCoversDifferentiator
openreview-mcpPeer review (reviews, meta-reviews, decisions, rebuttals)aggregate_weaknesses — clusters recurring complaints across rejections
retractionwatch-mcpPost-publication integrity (retractions, withdrawals, expressions of concern)Via Crossref + Retraction Watch CC-BY since Sept. 2023
orcid-mcpCanonical author identity (profile, works, affiliations, search)Disambiguation heuristic with token-overlap scoring
dblp-mcpHand-curated CS bibliography (authors, venues, publications)XML parser for /pid/X/Y.xml (DBLP exposes no JSON there)
zenodo-mcpArtifacts with permanent DOIs (datasets, software, supplementary)Version traversal via concept-recid

All five are MIT-licensed, all five install with pip install <name>, and all five follow the same design pattern.

The shared design pattern

The elegance of the Model Context Protocol is that a good MCP server is very thin. Each of our five servers has exactly the same layered structure:

src/<package>/
├── client.py     — httpx wrapper over the upstream API
├── cache.py      — on-disk TTL with diskcache, key by stable hash
├── schemas.py    — Pydantic v2 models, flat
├── tools/
│   ├── _helpers.py  — domain-specific normalisers
│   └── *.py         — pure functions grouped by entity
├── server.py     — FastMCP, one @mcp.tool() per function
└── cli.py        — argparse, stdio or streamable-http transport

Four decisions guided us:

1. Evidence, not policy. Every tool returns structured evidence (a RetractionFlag with an explicit update_type, a DisambiguationCandidate with a numeric score, a list of WeaknessCluster with their exemplars). The policy is applied by the calling agent. If the user wants a strict configuration (reject any flag that is not a "correction"), or a permissive one (only block formal retractions), or a monitoring dashboard, the server does not change. Baking a policy in would make it brittle.

2. Pydantic v2 for everything that goes out. Academic APIs are irregular: Crossref wraps dates as date-time or date-parts depending on the field; ORCID hides values behind {value: ...}; DBLP sometimes serialises aliases.alias as an object, sometimes as a list. The schema layer is where we promise what we return; that is where we catch drifts before they reach the agent.

3. Aggressive caching with an env-var bypass. Most academic data changes slowly (a retraction is published once; an ORCID profile is updated every few months; a DBLP page is reindexed weekly). We cache to disk with a per-tool TTL and give the user an environment variable (<NAME>_MCP_NO_CACHE=1) to skip it for debugging. The result: the second pass of a sweep over a .bib file takes seconds.

4. Offline tests against fakes. Each repo carries a conftest.py with a _FakeClient that serves pre-baked JSON responses indexed by URL path. The 41 tests across the four new servers run in under two seconds without touching the network. Offline tests, however, do not catch every bug (see the next section).

The bugs that only show up in production

Here is what we learned building the five. Each gotcha only surfaced when we ran a smoke test against the real API.

Crossref: sort=posted does not exist

retractionwatch_mcp.tools.search.recent_retractions is meant to return the most recent retractions ordered by the publication date of the notice. The natural intuition is sort=posted&order=desc. Crossref answers with a curt HTTP 400 Bad Request. The accepted values are created (deposit date in Crossref) and updated (last metadata modification). For retractions, updated is closest to the "notice date" intent, so that is the sort in the published version.

Lesson: large APIs document their accepted values, but the document always lags the behaviour. Offline tests do not catch server-side rejections.

ORCID: /search only returns identifiers

ORCID exposes two search endpoints. /search returns, literally, this:

{
  "result": [{"orcid-identifier": {"path": "0000-0001-9084-8782"}}],
  "num-found": 1771
}

Just the iD, no names, no affiliations. To resolve each hit into something useful you have to fire a second /<orcid>/person call per candidate; on a 100-hit search, that is one hundred round-trips.

/expanded-search (which ORCID documents in tutorials but does not mention in its official SDK) inlines name and primary affiliation in each hit:

{
  "expanded-result": [{
    "orcid-id": "0000-0002-9322-3515",
    "given-names": "Yoshua",
    "family-names": "Bengio",
    "institution-name": ["CIFAR"]
  }]
}

orcid-mcp uses /expanded-search by default, with a fallback to /search in case ORCID changes the endpoint. Another lesson: official SDKs do not always expose the best parts of the API.

DBLP: author pages are XML, not JSON

DBLP supports JSON for the search endpoints (/search/{publ,author,venue}/api?format=json). By symmetry, we assumed /pid/<pid>.json would work too. It returns HTTP 404 Not Found for every PID we tried. Author pages are served only as XML (.xml) or BibTeX (.bib); there is no JSON endpoint.

The fix is a forty-line stdlib XML parser (xml.etree.ElementTree) that maps each <r> to the same flat Publication the rest of the server produces. No new dependencies.

There is a second DBLP bug that also only surfaced in production: the numeric @id returned in each hit of search/author/api (e.g. 308198) is not the PID. The persistent PID lives in info.url in the form https://dblp.org/pid/56/953. You have to extract the slug from the URL. If you keep @id, every call to /pid/<id>.xml returns 404.

Zenodo: published-record buckets are locked

This one caught us at the end. We uploaded the four papers to Zenodo, captured the DOIs, and then realised the PDFs we had uploaded still followed the old template (not the OpenCódice one) and the author affiliations included UNED and Lleida (things we did not want). The natural fix would be to edit the records and replace the files.

Zenodo has an explicit policy: published-record buckets are write-locked. Although the actions/edit flow lets you modify metadata, you cannot replace files. The only way to upload a new PDF is to mint a new version via actions/newversion, which creates a draft with a fresh bucket. The new version receives a new version-DOI; the concept-DOI is preserved.

It is the right call by Zenodo (published records are citable and must not mutate under whoever cited them), but it forces you to think in versions from minute one. The v1 of each of our papers exists in Zenodo and always will; the v2 (the one that counts) has its own DOI under the same concept-DOI.

The family in use: the five servers reasoning together

The kind of question that makes sense to ask a single server was small: give me the reviews of this paper, is it retracted, what is this person's ORCID. The kind of question that makes sense when all five are connected is very different. An example we ran while drafting the technical reports:

# The agent reasons in steps. It starts from a DOI:
doi = "10.1016/j.eswa.2023.121640"

# 1. Crossref/openalex resolves basic metadata.
# 2. retractionwatch-mcp checks integrity.
flagged = rw_check_doi(doi)
assert not flagged["is_flagged"]

# 3. orcid-mcp resolves the authors to canonical IDs.
authors = [orcid_search_by_name(family_name=..., given_names=...) for a in metadata.authors]

# 4. dblp-mcp pulls the lead author's DBLP page to
#    contextualise (which venues they care about, how many preprints).
stats = dblp_author_stats(authors[0]["pid"])

# 5. zenodo-mcp looks for the accompanying dataset.
dataset = zenodo_search_by_creator(name=metadata.first_author_name)

In five calls to five different servers, the agent has: the paper, its integrity status, the canonical authors, the bibliometric context, and the artifacts. This is reasoning about a paper as a unit of research. Until two weeks ago, no agent connected to an LLM could do this without bespoke code.

The drafting of this very post was a test in itself: we resolved the second author (Jorge)'s ORCID by pointing orcid-mcp at his name with the affiliation hint "OpenCódice". The first match was exact. Two calls, under a second, no hardcoding. The server that documents this post was used to write it.

Release scheduling

A strategic decision we have been fairly explicit about: the five do not ship on the same day. Each release deserves its own visibility window, and four releases at once fragment the target audience's attention.

DateServerWhy this order
2026-04-25openreview-mcpDone. The strongest viral hook (peer review as a resource for LLMs)
2026-05-19retractionwatch-mcpNatural complement to openreview: "the reviews tell you what they thought, retractionwatch tells you if it survived"
2026-06-02orcid-mcpDisambiguation backbone; supports the others
2026-06-16dblp-mcpMore niche audience (CS) but very high quality
2026-06-30zenodo-mcpLoop closure: from review to actual artifacts

Every release follows the same T-10d → T-0d checklist: deposit the technical report on Zenodo (T-10d), test release on TestPyPI (T-7d), confirm the live DOI (T-3d), make the repo public (T-1d), tag v0.1.0 and publish the blog post (T-0d, Tuesday 9:00 CEST).

The technical reports

Each server has its own technical report in OC-TR format (an in-house OpenCódice template based on article with TikZ for branding and headers). All five are on Zenodo under CC-BY 4.0:

Each one is ten to twelve pages, with sections on architecture, tool catalogue, end-to-end validation against the real API, use cases beyond the worked example, limitations, and discussion. They are not formally peer-reviewed publications (yet), but they are citable via DOI and each repo carries a CITATION.cff.

What we take away

Three lessons for anyone planning to build an MCP server in this space:

1. A smoke test against the real API is non-negotiable. The four bugs above (Crossref sort, ORCID expanded-search, DBLP XML, Zenodo bucket-locking) were invisible to offline tests. Every server we have seen in the public MCP catalogue that looks fragile shares the same symptom: the author skipped a serious smoke test before shipping v0.1.

2. Evidence + Pydantic + stdio transport = the right composition. MCP is a good substrate when the tool is ergonomic for an LLM. Return flat, predictable dicts, with optional fields defaulting to None, leave policy to the agent, do not try to be "smart" in the wrapper. Wrappers that try to be smart (baked-in taxonomies, classifiers at the boundary) age badly.

3. Academic APIs are irregular by design. ORCID, Crossref, DBLP, and Zenodo are different organisations, founded at different times, with different design choices. The schema layer is where your wrapper pays that tax: without Pydantic, you find out about every drift through an agent exception in production. With Pydantic, you find out on the first call.

What is next

The family of five is closed at v0.1. Two obvious things on the horizon:

  • cited-by-mcp: a sixth server that, given a DOI, returns everything that cites it. Closes another natural loop (citation-graph reasoning as a composition primitive).
  • zenodo-mcp deposit mode in v0.2: an agent that can publish records, not just read them. That implies a separate credential with scoped permissions, because the blast radius changes.

Looking further out, the pattern generalises: the next under-served academic corpora (PubPeer comments, the editorial retraction feeds of every major publisher, peer-review-credit platforms like Publons or ORCID, disciplinary repositories like bioRxiv and SocArXiv with their own APIs) are all candidates for the same treatment. We set the quality bar at these five; the marginal cost of the next ones is low.

If you build something on top of any of the five, tell us. We will link to it.


OpenCódice Research builds open-source tools for academic workflows. This is the first installment of the academic-MCP family; we are working outward.