Blog

 4 minute read.

Sprinting to Progress: Behind the scenes of our first metadata sprint

If you take a peek at our blog, you’ll notice that metadata and community are the most frequently used categories. This is not a coincidence – ommunity is central to everything we do at Crossref. Our first-ever Metadata Sprint was a natural step in strengthening both. Cue fanfare!. And what better way of celebrating 25 years of Crossref?

We designed the Crossref Metadata Sprint as a relatively short event where people can form teams and tackle short problems. What kind of problems? While we expected many to involve coding, teams also explored documenting, translating, researching—anything that taps into our open, member-curated metadata. Our motivation behind this format was to create a space for networking, collaboration, and feedback, centered on co-creation using the scholarly metadata from our REST API, the Public Data File, and other sources.

What have we learned in planning

The journey towards the event was filled with valuable lessons and learnings from our community. Our initial call received submissions from 71 people, which was exciting but presented the first challenge: we felt our event would work better with a relatively smaller group. An additional challenge we faced was the enthusiasm from people from different regions of the world who were eager to join, but needed support to attend in person. It reminded us how global our community is, and how important it is to think about different ways of making participation possible, especially in future events.

We also wanted to make sure that participation wasn’t limited by technical background. The selection process included a preliminary review by several members of our team to bring in a mix of perspectives and reduce bias. The event welcomed participants from all kinds of expertise levels, including colleagues who had never worked with APIs before. We sought to provide common ground for all with several group calls, where we presented introductions to our tools and used the opportunity to collect requests about tools, specific data, and questions from the participants that could enhance their preparation during the sprint.

At the Crossref Metadata Sprint

I’ve recently stumbled upon the following quote from a recognized data scientist:

Numbers have an important story to tell. They rely on you to give them a clear and convincing voice. (Stephen Few) 1

It made me think that we can replace numbers for metadata and the idea still holds. Surrounded by the paleontological collections of the National Museum of Natural History, on 8th of April in Madrid, 21 participants and 5 Crossref staff came together to work on twelve different projects. These ranged from improvements to our Public Data file formats and exploring metadata completeness, to tackling multilingual metadata challenges, understanding citation impact for retracted works, and connecting Retraction Watch metadata with other knowledge graphs metadata.

A mosaic of pictures depicting groups of people working on their laptops
The different teams that participated in the first Crossref Metadata Sprint.

The initial hours were the most energetic (but not chaotic!) as most of the participants had the chance to interact in person for the first time, ideas were exchanged, and pre-formed groups became more stable (however, one of the advantages of the format is that teams don't have to be rigid). Twelve coffee- and tea-powered projects started taking shape, a few of which are part of larger ideas under development. By the end of the second day, we saw:
  • Author changes between preprints and published articles.
  • Coverage of funding information by publisher.
  • Enriching citations with Crossref metadata.
  • Funding metadata completeness.
  • Improvement to the Public Data File.
  • Interoperability between Crossref DOIs and hash-based identifiers.
  • University of Tetova’s metadata coverage.
  • Retraction Watch data mash-up.
  • Perspective about AI-driven multilingual metadata.
  • Public Data File in Google Big Query.
  • Visibility of retractions across citations.
  • Visualising Crossref geographic member data.

Our team worked as part of some of these projects, providing valuable insights and feedback to the participants. We ended the first session with a group dinner and re-energised for the second day, which started with everybody fully immersed in their tasks. As we approached the conclusion, the groups started preparing some quick slides for a short presentation (that you can find here).

Our team and the participants left excited and looking forward to the next opportunity to collaborate. We certainly see the potential of recreating these spaces, and we’ll work on future editions in a different location. All of the project summaries and notes will remain stored in our metadata sprint Gitlab repo. Would you like to know more about any of these ideas? Let us know in the comments.

An arragement of hexagons summarizing key facts about the 1st Metadata Sprint.

The first Crossref Metadata Sprint in a nutshell

Participants

None of this would’ve been possible without our enthusiastic participants. Huge thanks to everyone! Here is the full list of those who attended our inaugural Sprint:

Name
Blessing Abumere
Ana Bermejo
Robert Bianchi
Adam Buttrick
María de la Paz
Nicoleta Roxana Dinu
Jack Ekinsmyth
Castedo Ellerman
Álvaro Hontanar
Bianca Kramer
Anne L’Hôte
Cyril Labbe
Alexandra Malaga
Agon Memeti
Kaitlin Newson
Yağmur Öztürk
Dietrich Rordorf
Mohamed Selim
Sajad Sepehri
Ramazan Turgut
Iñaki Úcar

Further reading

Page owner: Luis Montilla   |   Last updated 2025-June-23