Blog

 4 minute read.

Building Better Connections: The Story of Crossref’s Metadata Development

Three years ago, we asked our members what they needed from Crossref’s metadata. We received confirmation that we were going in the right direction, as well as some new ideas to explore. This helped set the course for our metadata development work since then, and continues to guide where we’re headed next.

Every metadata update we make is driven by the same set of priorities: supporting metadata that reflects our organizational truths, focusing on what metadata our members can actually provide, and aligning with best practices, vocabularies, and standards that our wider scholarly community has established. More recently our Metadata Advisory Group has helped us explore both the minutia of working with metadata as well as larger ideas around the value and impact of the metadata we support.

What We’ve Accomplished

Our schema 5.4 update included several new or expanded types of metadata. First, citation metadata can now be labeled with a publication type. This means when a work cites an article, a preprint, a dataset, or software, that distinction is clear, helping make citations without an accompanying DOI metadata record easier to identify. Second, version information is now supported across all record types, giving the scholarly record a more precise handle on exactly which version of a work is being described.

We’ve also made two meaningful improvements to how funding relationships are captured. ROR IDs are now supported as funder identifiers in both our standard metadata schema and our grants-specific schema. Also, Grant DOIs can now be explicitly identified within funding metadata, making it possible to draw clearer lines between research outputs and the grants that supported them.

What’s happening now

A significant update is nearly here. Schema 5.5 will expand contributor metadata to support multiple roles per contributor, and will introduce support for CRediT — the ANSI/NISO taxonomy for contributor roles. This means that an individual’s complete contribution to a research output can finally be described in our metadata, rather than flattened into a single role or omitted entirely. The schema isn’t released yet, but the final version of the XML schema is available in our GitLab repository for those who want to get a head start.

We’ll next begin implementation work for a new Grants schema (0.3.0). This update will remodel investigator names to include a new role (beneficiary) as well as an organizational grant recipient, making it possible to include recipient info for grants given to organizations. Grant records include project metadata, so this update will also include support for RAiD, a persistent identifier for projects. The XML schema for this update is also available in a GitLab repository.

What’s up next

Our next planned major update will build substantially on the contributor work in version 5.5. In the next version (6.0) we will remodel names to expand our current limited structure to support a variety of name types as well as alternate names. We’ll also expand the contributor identifiers we collect to include ISNI and Wikidata identifiers, better supporting contributors for whom an ORCID is not possible. Our organizational contributor will be remodeled as well to include organization-level identifiers like ROR.

We’ll also introduce statements to Crossref metadata. Statements will allow members to include free-text statements including funding acknowledgements, ethics declarations, AI usage disclosures, and other important contextual information that doesn’t fit neatly into structured fields.

Other updates include expanding our support for abstracts encoding beyond JATS to include ONIX, BITS, and a generic markup option, and implementing better in-schema validation to avoid surprises at the time of deposit.

Progress means letting go of the past. We’re planning to deprecate all schemas prior to version 5.3.1 by the end of 2027, to be carried out in phases as outlined in our deprecation blog post. This is a necessary step to keep our infrastructure sustainable and to ensure members are working with schemas that reflect current capabilities and standards.

Looking further ahead

Beyond 6.0, we’re exploring further support for provenance in metadata (to establish who is doing what to a metadata record), a rethinking of how we handle dates so that they better capture the lifecycle of a research object, better support for research objects we don’t yet fully support, and making our metadata inputs more consistent. The Metadata Development roadmap has full details on what’s being explored and prioritized.

Each of these updates contributes to Crossref’s research nexus vision: strengthening connections between funders and research, more accurately capturing and recognizing contributor roles in the scholarly record, and collecting free-text content to fill in the gaps that structured metadata alone can’t address. Better metadata means better research integrity and more trustworthy infrastructure for everyone who depends on it.

Further reading