Multi-label Annotation in Scientific Articles - The Multi-label Cancer Risk Assessment Corpus

With the constant growth of the scientific literature, automated processes to enable access to its contents are increasingly in demand. Several functional discourse annotation schemes have been proposed to facilitate information extraction and summarisation from scientific articles, the most well known being argumentative zoning. Core Scientific concepts (CoreSC) is a three layered fine-grained annotation scheme providing content-based annotations at the sentence level and has been used to index, extract and summarise scientific publications in the biomedical literature. A previously developed CoreSC corpus on which existing automated tools have been trained contains a single annotation for each sentence. However, it is the case that more than one CoreSC concept can appear in the same sentence. Here, we present the Multi-CoreSC CRA corpus, a text corpus specific to the domain of cancer risk assessment (CRA), consisting of 50 full text papers, each of which contains sentences annotated with one or more CoreSCs. The full text papers have been annotated by three biology experts. We present several inter-annotator agreement measures appropriate for multi-label annotation assessment. Employing several inter-annotator agreement measures, we were able to identify the most reliable annotator and we built a harmonised consensus (gold standard) from the three different annotators, while also taking concept priority (as specified in the guidelines) into account. We also show that the new Multi-CoreSC CRA corpus allows us to improve performance in the recognition of CoreSCs. The updated guidelines, the multi-label CoreSC CRA corpus and other relevant, related materials are available at the time of publication at http://www.sapientaproject.com/.

[1]  Hervé Déjean,et al.  Introduction to the CoNLL-2001 shared task: clause identification , 2001, CoNLL.

[2]  Jean-Marc Constans,et al.  Fuzzy kappa for the agreement measure of fuzzy classifications , 2007, Neurocomputing.

[3]  Rashmi Prasad,et al.  Reflections on the Penn Discourse TreeBank, Comparable Corpora, and Complementary Annotation , 2014, CL.

[4]  Maria Liakata,et al.  Semantic Annotation of Papers: Interface & Enrichment Tool (SAPIENT) , 2009, BioNLP@HLT-NAACL.

[5]  Maria Liakata,et al.  Partridge: An Effective System for the Automatic Cassification of the Types of Academic Papers , 2013, SGAI Conf..

[6]  J. Schlezinger,et al.  Aryl hydrocarbon receptor (AhR) agonists suppress interleukin-6 expression by bone marrow stromal cells: an immunotoxicology study , 2003, Environmental health : a global access science source.

[7]  K. Krippendorff Bivariate Agreement Coefficients for Reliability of Data , 1970 .

[8]  Pabitra Mitra,et al.  Determining Reliability of Subjective and Multi-label Emotion Annotation through Novel Fuzzy Agreement Measure , 2010, LREC.

[9]  Hong Yu,et al.  The biomedical discourse relation bank , 2011, BMC Bioinformatics.

[10]  Maria Liakata,et al.  The ART Corpus , 2009 .

[11]  Maria Liakata,et al.  Dynamic enhancement of drug product labels to support drug safety, efficacy, and effectiveness , 2013, J. Biomed. Semant..

[12]  Dietrich Rebholz-Schuhmann,et al.  A Discourse-Driven Content Model for Summarising Scientific Articles Evaluated in a Complex Question Answering Task , 2013, EMNLP.

[13]  Bonnie L. Webber,et al.  Discourse structure and language technology , 2011, Natural Language Engineering.

[14]  Andrew Rosenberg,et al.  Augmenting the kappa statistic to determine interannotator reliability for multiply labeled data points , 2004, HLT-NAACL.

[15]  Simone Teufel,et al.  Corpora for the Conceptualisation and Zoning of Scientific Papers , 2010, LREC.

[16]  Luciano Del Corro,et al.  ClausIE: clause-based open information extraction , 2013, WWW.

[17]  A G N,et al.  Bibliographical References , 1965 .

[18]  Hiroshi Kanayama,et al.  Unsupervised lexicon induction for clause-level detection of evaluations , 2012, Nat. Lang. Eng..

[19]  Klaus krippendorff,et al.  Measuring the Reliability of Qualitative Text Analysis Data , 2004 .

[20]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .