Linguistic scope-based and biological event-based speculation and negation annotations in the Genia Event and BioScope corpora

Background: The treatment of negation and hedging in natural language processing has received much interest recently, especially in the biomedical domain. However, open access corpora annotated for negation and/or speculation are hardly available for training and testing applications, and even if they are, they sometimes follow different design principles. In this paper, the annotation principles of the two largest corpora containing annotation for negation and speculation – BioScope and Genia Event – are compared. BioScope marks linguistic cues and their scopes for negation and speculation while in Genia biological events are marked for uncertainty and/or negation. Results: Differences among the annotations of the two corpora are thematically categorized and the frequency of each category is estimated. We found that the largest amount of differences is due to the issue that scopes – which cover text spans – deal with the key events and each argument (including events within events) of these events is under the scope as well. In contrast, Genia deals with the modality of events within events independently. Conclusions: We think that the useful information for the biologist can be acquired from the key events, thus if we aim to detect ”new knowledge”, an automatic scope-detector trained on BioScope can contribute to biomedical information extraction. However, for detecting the negation and speculation status of events (within events) syntax-based rules investigating the dependency path between the modality cue and the event cue may be employed.

[1]  János Csirik,et al.  The CoNLL-2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text , 2010, CoNLL Shared Task.

[2]  Padmini Srinivasan,et al.  The Language of Bioscience: Facts, Speculations, and Statements In Between , 2004, HLT-NAACL 2004.

[3]  György Szarvas,et al.  Hedge Classification in Biomedical Texts with a Weakly Supervised Selection of Keywords , 2008, ACL.

[4]  Dragomir R. Radev,et al.  Detecting Speculations and their Scopes in Scientific Text , 2009, EMNLP.

[5]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[6]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[7]  Martin Krallinger Importance of negations and experimental qualifiers in biomedical literature , 2010, NeSp-NLP@ACL.

[8]  Jun'ichi Tsujii,et al.  Corpus annotation for mining biomedical events from literature , 2008, BMC Bioinformatics.

[9]  Ted Briscoe,et al.  Weakly Supervised Learning for Hedge Classification in Scientific Literature , 2007, ACL.

[10]  Sophia Ananiadou,et al.  Evaluating a meta-knowledge annotation scheme for bio-events , 2010, NeSp-NLP@ACL.

[11]  James Pustejovsky,et al.  FactBank: a corpus annotated with event factuality , 2009, Lang. Resour. Evaluation.

[12]  János Csirik,et al.  The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes , 2008, BMC Bioinformatics.

[13]  Halil Kilicoglu,et al.  Syntactic Dependency Based Heuristics for Biological Event Extraction , 2009, BioNLP@HLT-NAACL.

[14]  Michael Strube,et al.  Finding Hedges by Chasing Weasels: Hedge Detection Using Wikipedia Tags and Shallow Linguistic Features , 2009, ACL.

[15]  Roser Morante,et al.  Joint Memory-Based Learning of Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[16]  Kazuhiko Ohe,et al.  TEXT2TABLE: Medical Text Summarization System Based on Named Entity Recognition and Modality Identification , 2009, BioNLP@HLT-NAACL.

[17]  Hagit Shatkay,et al.  Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users , 2008, Bioinform..

[18]  Özlem Uzuner,et al.  Viewpoint Paper: Recognizing Obesity and Comorbidities in Sparse Data , 2009, J. Am. Medical Informatics Assoc..

[19]  Isaac G. Councill,et al.  What's great and what's not: learning to classify the scope of negation for improved sentiment analysis , 2010, NeSp-NLP@ACL.

[20]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[21]  K. Bretonnel Cohen,et al.  A shared task involving multi-label classification of clinical free text , 2007, BioNLP@ACL.

[22]  Wendy W. Chapman,et al.  ConText: An Algorithm for Identifying Contextual Features from Clinical Text , 2007, BioNLP@ACL.