CORE Dataset

Download millions of research outputs for text and data analysis

CORE Dataset's screenshot
  • Download all CORE data for big data processing

  • Prototype, analyse and mine your data in your infrastructure

  • World's largest full text collection of scientific papers for machine processing

CORE data can be downloaded as a bulk dataset, allowing you to process it on your own computer or within your infrastructure. The dataset provides a harmonised and enriched data format for access content from across our data providers. This is perfect for prototyping new methods, especially when intensive data processes need to be run. It is also a good choice for data analysis and text mining.

If you use CORE in your work, we kindly request you to cite one of our publications.

Dataset 2020-03-18

Full dataset (~400GB, 2.1TB Extracted)

Dataset 2018-03-01

Metadata only dataset (beta) (127 GB) - 123M metadata items, 85.6M items with abstract

With full text dataset (beta) (330 GB) - 123M metadata items, 85.6M items with abstract, 9.8M items with fulltext.

Documentation and access to previous datasets.

What’s included

The dataset provides you with:

  • The entire CORE's corpus of both metadata and full texts in a machine processable format.
  • Mappings of the CORE articles to entities in the Microsoft Academic Graph (MAG), enabling to access CORE fulltexts and use additional entities from MAG where available.
  • Detailed documentation on how to download the CORE dataset and how data is organised.

The terms of use for the dataset are available on our datasets download page.

Register for the CORE Dataset

Enter your email address to register for our datasets or access the download page if you have already registered. Please enter your institutional email if you are registering in an institutional capacity.

We will send the instructions to this address