Blog

Free public data file of 112+ million Crossref records

A lot of people have been using our public, open APIs to collect data that might be related to COVID-19. This is great and we encourage it. We also want to make it easier. To that end we have made a free data file of the public elements from Crossref’s 112.5 million metadata records.

The file (65GB, in JSON format) is available via Academic Torrents here: https://doi.org/10.13003/83B2GP

It is important to note that Crossref metadata is always openly available. The difference here is that we’ve done the time-saving work of putting all of the records registered through March 2020 into one file for download.

You’ve had your say, now what? Next steps for schema changes

It seems like ages ago, particularly given recent events, but we had our first public request for feedback on proposed schema updates in December and January. The feedback we received indicated two big things: we’re on the right track, and you want us to go further. This update has some significant but important changes to contributors, but is otherwise a fairly moderate update. The feedback was mostly supportive, with a fair number of helpful suggestions about details.

Encouraging even greater reporting of corrections and retractions

TL;DR: We no longer charge fees for members to participate in Crossmark, and we encourage all our members to register metadata about corrections and retractions - even if you can’t yet add the Crossmark button and pop-up box to your landing pages or PDFs.

Events got the better of us

Publisher metadata is one side of the story surrounding research outputs, but conversations, connections and activities that build further around scholarly research, takes place all over the web. We built Event Data to capture, record and make available these ‘Events’ –– providing open, transparent, and traceable information about the provenance and context of every Event. Events are comments, links, shares, bookmarks, references, etc.

Metadata Manager Update

At Crossref, we’re committed to providing a simple, usable, efficient and scalable web-based tool for registering content by manually making deposits of, and updates to, metadata records. Last year we launched Metadata Manager in beta for journal deposits to help us explore this further. Since then, many members have used the tool and helped us better understand their needs.

Double trouble with DOIs

Dominika Tkaczyk

Dominika Tkaczyk – 2020 March 10

In R&DMetadata

Detective Matcher stopped abruptly behind the corner of a short building, praying that his loud heartbeat doesn’t give up his presence. This missing DOI case was unlike any other before, keeping him awake for many seconds already. It took a great effort and a good amount of help from his clever assistant Fuzzy Comparison to make sense of the sparse clues provided by Miss Unstructured Reference, an elegant young lady with a shy smile, who begged him to take up this case at any cost.

Crossref metadata for bibliometrics

Our paper, Crossref: the sustainable source of community-owned scholarly metadata, was recently published in Quantitative Science Studies (MIT Press). The paper describes the scholarly metadata collected and made available by Crossref, as well as its importance in the scholarly research ecosystem.

Using the Crossref REST API (with Open Ukrainian Citation Index)

Over the past few years, I’ve been really interested in seeing the breadth of uses that the research community is finding for the Crossref REST API. When we ran Crossref LIVE Kyiv in March 2019, Serhii Nazarovets joined us to present his plans for the Open Ukrainian Citation Index, an initiative he explains below.

But first an introduction to Serhii and his colleague Tetiana Borysova.

Serhii Nazarovets is a Deputy Director for Research at the State Scientific and Technical Library of Ukraine. Serhii has a Ph.D. in Social Communication Science. His research interests lie in the area of scientometrics and library science. Serhii is the Associate Editor for DOAJ (www.doaj.org) and the Regional Editor for E-LIS (Eprints in Library and Information Science). Serhii has worked in different scientific libraries of Ukraine for more than 10 years. Tetiana Borysova is a Senior Researcher at the State Scientific and Technical Library of Ukraine. Her research interests are focused on topics such as research data management, journal management and scientometrics.

Proposed schema changes - have your say

The first version of our metadata input schema (a DTD, to be specific) was created in 1999 to capture basic bibliographic information and facilitate matching DOIs to citations. Over the past 20 years the bibliographic metadata we collect has deepened, and we’ve expanded our schema to include funding information, license, updates, relations, and other metadata. Our schema isn’t as venerable as a MARC record or as comprehensive as JATS, but it’s served us well. It’s not currently positioned to fully support everything we want to do long term - we’d like to support assertions, map cleanly to JATS and schema.org magically at the same time, and maybe even move beyond XML - but for now it’s something we can work with to empower member metadata to help find, cite, and connect scholarly content.

Request for feedback: Conference ID implementation

We’ve all been subject to floods of conference invitations, it can be difficult to sort the relevant from the not-relevant or (even worse) sketchy conferences competing for our attention. In 2017, DataCite and Crossref started a working group to investigate creating identifiers for conferences and projects. Identifiers describe and disambiguate, and applying identifiers to conference events will help build clear durable connections between scholarly events and scholarly literature.

Chaired by Aliaksandr Birukou, the Executive Editor for Computer Science at Springer Nature, the group has met regularly over the past two years, collaborating to create use cases and define metadata to identify and describe conference series and events. We first asked for input on metadata specifications in April 2018. Technical implementation kicked off in February with a workshop at CERN to discuss the mechanics of making PIDs for conferences a reality.