Evaluating a meta-knowledge annotation scheme for bio-events

The correct interpretation of biomedical texts by text mining systems requires the recognition of a range of types of high-level information (or meta-knowledge) about the text. Examples include expressions of negation and speculation, as well as pragmatic/rhetorical intent (e.g. whether the information expressed represents a hypothesis, generally accepted knowledge, new experimental knowledge, etc.) Although such types of information have previously been annotated at the text-span level (most commonly sentences), annotation at the level of the event is currently quite sparse. In this paper, we focus on the evaluation of the multi-dimensional annotation scheme that we have developed specifically for enriching bio-events with meta-knowledge information. Our annotation scheme is intended to be general enough to allow integration with different types of bio-event annotation, whilst being detailed enough to capture important subtleties in the nature of the meta-knowledge expressed in the text. To our knowledge, our scheme is unique within the field with regards to the diversity of meta-knowledge aspects annotated for each event, whilst the evaluation results have confirmed its feasibility and soundness.

[1]  S. Ananiadou,et al.  Meta-Knowledge Annotation of Bio-Events , 2010, LREC.

[2]  Sophia Ananiadou,et al.  Construction of an annotated corpus to support biomedical information extraction , 2009, BMC Bioinformatics.

[3]  Simon Buckingham Shum,et al.  Hypotheses, evidence and relationships: The HypER approach for representing scientific knowledge claims , 2009, ISWC 2009.

[4]  Simone Teufel,et al.  Towards Domain-Independent Argumentative Zoning: Evidence from Chemistry and Computational Linguistics , 2009, EMNLP.

[5]  Roser Morante,et al.  A Metalearning Approach to Processing the Scope of Negation , 2009, CoNLL.

[6]  János Csirik,et al.  The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes , 2008, BMC Bioinformatics.

[7]  Hagit Shatkay,et al.  Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users , 2008, Bioinform..

[8]  Sophia Ananiadou,et al.  Categorising Modality in Biomedical Texts , 2008, LREC 2008.

[9]  Jun'ichi Tsujii,et al.  New challenges for text mining: mapping between text and manually curated pathways , 2008, BMC Bioinformatics.

[10]  Jun'ichi Tsujii,et al.  Corpus annotation for mining biomedical events from literature , 2008, BMC Bioinformatics.

[11]  Ágnes Sándor,et al.  Modeling metadiscourse conveying the author's rhetorical strategy in biomedical research abstracts , 2007 .

[12]  Massimo Poesio,et al.  Negation of protein-protein interactions: analysis and extraction , 2007, ISMB/ECCB.

[13]  Ted Briscoe,et al.  Weakly Supervised Learning for Hedge Classification in Scientific Literature , 2007, ACL.

[14]  Victoria L. Rubin Stating with Certainty or Stating with Doubt: Intercoder Reliability Results for Manual Annotation of Epistemically Modalized Statements , 2007, NAACL.

[15]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[16]  Dietrich Rebholz-Schuhmann,et al.  Using argumentation to extract key sentences from biomedical abstracts , 2007, Int. J. Medical Informatics.

[17]  Hagit Shatkay,et al.  New directions in biomedical text annotation: definitions, guidelines and corpus construction , 2006, BMC Bioinformatics.

[18]  Jun'ichi Tsujii,et al.  Semantic Retrieval for the Accurate Identification of Relational Concepts in Massive Textbases , 2006, ACL.

[19]  Ken Hyland Metadiscourse: Exploring Interaction in Writing , 2005 .

[20]  Nigel Collier,et al.  Zone Identification in Biology Articles as a Basis for Information Extraction , 2004, NLPBA/BioNLP.

[21]  Petra Saskia Bayerl,et al.  Text Type Structure and Logical Document Structure , 2004, ACL 2004.

[22]  Padmini Srinivasan,et al.  The Language of Bioscience: Facts, Speculations, and Statements In Between , 2004, HLT-NAACL 2004.

[23]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[24]  Jean Carletta,et al.  An annotation scheme for discourse-level argumentation in research articles , 1999, EACL.

[25]  Leo Hoye,et al.  Adverbs and Modality in English , 1997 .

[26]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[27]  Advaith Siddharthan,et al.  Towards Domain-Independent Argumentative Zoning: Evidence from Chemistry and Computational Linguistics , 2009, EMNLP.

[28]  Vassiliki Rizomilioti Exploring Epistemic Modality in Academic Discourse Using Corpora , 2006 .

[29]  Aaron N. Kaplan,et al.  Discovering Paradigm Shift Patterns in Biomedical Abstracts: Application to Neurodegenerative Diseases , 2005 .

[30]  Padmini Srinivasan,et al.  Categorization of Sentence Types in Medical Abstracts , 2003, AMIA.

[31]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[32]  K. Krippendorff Krippendorff, Klaus, Content Analysis: An Introduction to its Methodology . Beverly Hills, CA: Sage, 1980. , 1980 .