CODA-19: Using a Non-Expert Crowd to Annotate Research Aspects on 10,000+ Abstracts in the COVID-19 Open Research Dataset

This paper introduces CODA-19, a human-annotated dataset that codes the Background, Purpose, Method, Finding/Contribution, and Other sections of 10,966 English abstracts in the COVID-19 Open Research Dataset. CODA-19 was created by 248 crowd workers from Amazon Mechanical Turk within 10 days, and achieved labeling quality comparable to that of experts. Each abstract was annotated by nine different workers, and the final labels were acquired by majority vote. The inter-annotator agreement (Cohen's kappa) between the crowd and the biomedical expert (0.741) is comparable to inter-expert agreement (0.788). CODA-19's labels have an accuracy of 82.2% when compared to the biomedical expert's labels, while the accuracy between experts was 85.0%. Reliable human annotations help scientists access and integrate the rapidly accelerating coronavirus literature, and also serve as the battery of AI/NLP research, but obtaining expert annotations can be slow. We demonstrated that a non-expert crowd can be rapidly employed at scale to join the fight against COVID-19.

[1]  Eduard H. Hovy,et al.  Experiment Segmentation in Scientific Discourse as Clause-level Structured Prediction using Recurrent Neural Networks , 2017, ArXiv.

[2]  Matt Post,et al.  Constructing Parallel Corpora for Six Indian Languages via Crowdsourcing , 2012, WMT@NAACL-HLT.

[3]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[4]  Maria Liakata,et al.  Multi-label Annotation in Scientific Articles - The Multi-label Cancer Risk Assessment Corpus , 2016, LREC.

[5]  Eva Siegenthaler,et al.  Reading on LCD vs e‐Ink displays: effects on fatigue and visual strain , 2012, Ophthalmic & physiological optics : the journal of the British College of Ophthalmic Opticians.

[6]  Jason S. Chang,et al.  Computational Analysis of Move Structures in Academic Abstracts , 2006, ACL.

[7]  Tong Shu Li,et al.  A crowdsourcing workflow for extracting chemical-induced disease relations from free text , 2016, Database J. Biol. Databases Curation.

[8]  James Pustejovsky,et al.  A Methodology for Using Professional Knowledge in Corpus , 2013 .

[9]  David Martínez,et al.  Automatic classification of sentences to support Evidence Based Medicine , 2011, BMC Bioinformatics.

[10]  Chris Callison-Burch,et al.  Crowdsourcing Translation: Professional Quality from Non-Professionals , 2011, ACL.

[11]  Dafna Shahaf,et al.  SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers , 2018, Proc. ACM Hum. Comput. Interact..

[12]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[13]  Franck Dernoncourt,et al.  PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts , 2017, IJCNLP.

[14]  Benjamin M. Good,et al.  Microtask Crowdsourcing for Disease Mention Annotation in PubMed Abstracts , 2014, Pacific Symposium on Biocomputing.

[15]  Danushka Bollegala,et al.  Correcting Crowdsourced Annotations to Improve Detection of Outcome Types in Evidence Based Medicine , 2019, KHD@IJCAI.

[16]  Kalpana Raja,et al.  Classification of clinically useful sentences in clinical evidence resources , 2016, J. Biomed. Informatics.

[17]  Oren Etzioni,et al.  CORD-19: The Covid-19 Open Research Dataset , 2020, NLPCOVID19.

[18]  Simone Teufel,et al.  Corpora for the Conceptualisation and Zoning of Scientific Papers , 2010, LREC.

[19]  Hsin-Hsi Chen,et al.  DISA: A Scientific Writing Advisor with Deep Information Structure Analysis , 2017, IJCAI.

[20]  Jin Zhao,et al.  Exploiting Classification Correlations for the Extraction of Evidence-based Practice Information , 2012, AMIA.

[21]  James Hartley,et al.  Current findings from research on structured abstracts. , 2004, Journal of the Medical Library Association : JMLA.

[22]  Behrang Q. Zadeh,et al.  SemEval-2018 Task 7: Semantic Relation Extraction and Classification in Scientific Papers , 2018, *SEMEVAL.

[23]  Grace Yuet-Chee Chung,et al.  Sentence retrieval for abstracts of randomized controlled trials , 2009, BMC Medical Informatics Decis. Mak..

[24]  Maria Liakata,et al.  Semantic Annotation of Papers: Interface & Enrichment Tool (SAPIENT) , 2009, BioNLP@HLT-NAACL.

[25]  Donghui Li,et al.  MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts , 2019, AKBC.

[26]  Yuji Matsumoto,et al.  Extracting Clinical Trial Design Information from MEDLINE Abstracts , 2007, New Generation Computing.

[27]  Gilles Adda,et al.  Ethical Issues in Corpus Linguistics And Annotation: Pay Per Hit Does Not Affect Effective Hourly Rate For Linguistic Resource Development On Amazon Mechanical Turk. , 2016, LREC ... International Conference on Language Resources & Evaluation : [proceedings]. International Conference on Language Resources and Evaluation.

[28]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[29]  Chris Callison-Burch,et al.  Cost Optimization for Crowdsourcing Translation , 2015 .

[30]  Michael Alley,et al.  The Craft of Scientific Writing , 1987 .

[31]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[32]  James Pustejovsky,et al.  Natural Language Annotation for Machine Learning - a Guide to Corpus-Building for Applications , 2012 .

[33]  Vincent A. Traag,et al.  A scientometric overview of CORD-19 , 2020, bioRxiv.

[34]  Simon Buckingham Shum,et al.  Hypotheses, evidence and relationships: The HypER approach for representing scientific knowledge claims , 2009, ISWC 2009.

[35]  Jonathan L. McMurry,et al.  Kinetic Analysis of Bacteriophage Sf6 Binding to Outer Membrane Protein A Using Whole Virions , 2019, bioRxiv.

[36]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[37]  Shashank Agarwal,et al.  Automatically Classifying Sentences in Full-Text Biomedical Articles into Introduction, Methods, Results and Discussion , 2009, Summit on translational bioinformatics.

[38]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[39]  Dietrich Rebholz-Schuhmann,et al.  Using argumentation to extract key sentences from biomedical abstracts , 2007, Int. J. Medical Informatics.

[40]  Naoaki Okazaki,et al.  Identifying Sections in Scientific Abstracts using Conditional Random Fields , 2008, IJCNLP.

[41]  Behrang Q. Zadeh,et al.  The ACL RD-TEC 2.0: A Language Resource for Evaluating Term Extraction and Entity Recognition Methods , 2016, LREC.

[42]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[43]  Padmini Srinivasan,et al.  Categorization of Sentence Types in Medical Abstracts , 2003, AMIA.

[44]  Peter Clark,et al.  SciTaiL: A Textual Entailment Dataset from Science Question Answering , 2018, AAAI.

[45]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[46]  Heng Ji,et al.  PaperRobot: Incremental Draft Generation of Scientific Ideas , 2019, ACL.

[47]  Jau-Min Wong,et al.  PICO element detection in medical text without metadata: Are first sentences enough? , 2013, J. Biomed. Informatics.

[48]  Nigel Collier,et al.  Zone analysis in biology articles as a basis for information extraction , 2006, Int. J. Medical Informatics.

[49]  Debarshi Kumar Sanyal,et al.  Segmenting Scientific Abstracts into Discourse Categories: A Deep Learning-Based Approach for Sparse Labeled Data , 2020, JCDL.

[50]  Keno März,et al.  Large-scale medical image annotation with crowd-powered algorithms , 2018, Journal of medical imaging.

[51]  Jeffrey Heer,et al.  Identifying medical terms in patient-authored text: a crowdsourcing-based approach , 2013, J. Am. Medical Informatics Assoc..

[52]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[53]  Chris Callison-Burch,et al.  Learning Translations via Matrix Completion , 2017, EMNLP.

[54]  Anna Korhonen,et al.  Using Argumentative Zones for Extractive Summarization of Scientific Articles , 2012, COLING.

[55]  Enrico W. Coiera,et al.  A Study of Structured Clinical Abstracts and the Semantic Classification of Sentences , 2007, BioNLP@ACL.

[56]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[57]  Jian-Yun Nie,et al.  Combining classifiers for robust PICO element detection , 2010, BMC Medical Informatics Decis. Mak..

[58]  Isabelle Augenstein,et al.  SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications , 2017, *SEMEVAL.

[59]  Simone Teufel,et al.  Argumentative zoning information extraction from scientific text , 1999 .

[60]  K. Bretonnel Cohen,et al.  Last Words: Amazon Mechanical Turk: Gold Mine or Coal Mine? , 2011, CL.

[61]  Peter Szolovits,et al.  PICO Element Detection in Medical Text via Long Short-Term Memory Neural Networks , 2018, BioNLP.

[62]  Anita de Waard,et al.  Verb Form Indicates Discourse Segment Type in Biological Research Papers: Experimental Evidence. , 2012 .

[63]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.