CODA-19: Reliably Annotating Research Aspects on 10,000+ CORD-19 Abstracts Using a Non-Expert Crowd

This paper introduces CODA-19, a human-annotated dataset that codes the Background, Purpose, Method, Finding/Contribution, and Other sections of 10,966 English abstracts in the COVID-19 Open Research Dataset. CODA-19 was created by 248 crowd workers from Amazon Mechanical Turk within 10 days, achieving a label quality comparable to that of experts. Each abstract was annotated by nine different workers, and the final labels were obtained by majority vote. The inter-annotator agreement (Cohen's kappa) between the crowd and the biomedical expert (0.741) is comparable to inter-expert agreement (0.788). CODA-19's labels have an accuracy of 82.2% when compared to the biomedical expert's labels, while the accuracy between experts was 85.0%. Reliable human annotations help scientists to understand the rapidly accelerating coronavirus literature and also serve as the battery of AI/NLP research, but obtaining expert annotations can be slow. We demonstrated that a non-expert crowd can be rapidly employed at scale to join the fight against COVID-19.

[1]  Simone Teufel,et al.  Argumentative zoning information extraction from scientific text , 1999 .

[2]  James Hartley,et al.  Current findings from research on structured abstracts. , 2004, Journal of the Medical Library Association : JMLA.

[3]  Nigel Collier,et al.  Zone analysis in biology articles as a basis for information extraction , 2006, Int. J. Medical Informatics.

[4]  Hagit Shatkay,et al.  New directions in biomedical text annotation: definitions, guidelines and corpus construction , 2006, BMC Bioinformatics.

[5]  Simon Buckingham Shum,et al.  Hypotheses, evidence and relationships: The HypER approach for representing scientific knowledge claims , 2009, ISWC 2009.

[6]  Simone Teufel,et al.  Corpora for the Conceptualisation and Zoning of Scientific Papers , 2010, LREC.

[7]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[8]  Anita de Waard,et al.  Verb Form Indicates Discourse Segment Type in Biological Research Papers: Experimental Evidence. , 2012 .

[9]  Eva Siegenthaler,et al.  Reading on LCD vs e‐Ink displays: effects on fatigue and visual strain , 2012, Ophthalmic & physiological optics : the journal of the British College of Ophthalmic Opticians.

[10]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[11]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[12]  Benjamin M. Good,et al.  Microtask Crowdsourcing for Disease Mention Annotation in PubMed Abstracts , 2014, Pacific Symposium on Biocomputing.

[13]  Behrang Q. Zadeh,et al.  The ACL RD-TEC 2.0: A Language Resource for Evaluating Term Extraction and Entity Recognition Methods , 2016, LREC.

[14]  Zhiyong Lu,et al.  Crowdsourcing in biomedicine: challenges and opportunities , 2016, Briefings Bioinform..

[15]  Eduard H. Hovy,et al.  Experiment Segmentation in Scientific Discourse as Clause-level Structured Prediction using Recurrent Neural Networks , 2017, ArXiv.

[16]  Franck Dernoncourt,et al.  PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts , 2017, IJCNLP.

[17]  Isabelle Augenstein,et al.  SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications , 2017, *SEMEVAL.

[18]  Hsin-Hsi Chen,et al.  DISA: A Scientific Writing Advisor with Deep Information Structure Analysis , 2017, IJCAI.

[19]  Keno März,et al.  Large-scale medical image annotation with crowd-powered algorithms , 2018, Journal of medical imaging.

[20]  Behrang Q. Zadeh,et al.  SemEval-2018 Task 7: Semantic Relation Extraction and Classification in Scientific Papers , 2018, *SEMEVAL.

[21]  Halil Kilicoglu,et al.  Biomedical Text Mining for Research Rigor and Integrity: Tasks, Challenges, Directions , 2017, bioRxiv.

[22]  Dafna Shahaf,et al.  SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers , 2018, Proc. ACM Hum. Comput. Interact..

[23]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[24]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[25]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[26]  Heng Ji,et al.  PaperRobot: Incremental Draft Generation of Scientific Ideas , 2019, ACL.

[27]  Danushka Bollegala,et al.  Correcting Crowdsourced Annotations to Improve Detection of Outcome Types in Evidence Based Medicine , 2019, KHD@IJCAI.

[28]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[29]  Donghui Li,et al.  MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts , 2019, AKBC.

[30]  Jonathan L. McMurry,et al.  Kinetic Analysis of Bacteriophage Sf6 Binding to Outer Membrane Protein A Using Whole Virions , 2019, bioRxiv.

[31]  Oren Etzioni,et al.  CORD-19: The Covid-19 Open Research Dataset , 2020, NLPCOVID19.

[32]  Vincent A. Traag,et al.  A scientometric overview of CORD-19 , 2020, bioRxiv.

[33]  Debarshi Kumar Sanyal,et al.  Segmenting Scientific Abstracts into Discourse Categories: A Deep Learning-Based Approach for Sparse Labeled Data , 2020, JCDL.