COVID-19 Open Research Dataset (CORD-19)

A Free, Open Resource for the Global Research Community

In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of over 29,000 scholarly articles, including over 13,000 with full text, about COVID-19 and the coronavirus family of viruses for use by the global research community.

This dataset is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease. The corpus will be updated weekly as new research is published in peer-reviewed publications and archival services like bioRxiv, medRxiv, and others.

  • Commercial use subset (includes PMC content) -- 9000 papers, 186Mb
  • Non-commercial use subset (includes PMC content) -- 1973 papers, 36Mb
  • PMC custom license subset -- 1426 papers, 19Mb
  • bioRxiv/medRxiv subset (pre-prints that are not peer reviewed) -- 803 papers, 13Mb
  • Metadata file -- 47Mb
  • Readme



最後更新 三月 19, 2020, 10:28 (CST)
建立 三月 19, 2020, 10:19 (CST)