A three-way perspective on scientific discourse annotation for knowledge extraction

This paper presents a three-way perspective on the annotation of discourse in scientific literature. We use three different schemes, each of which focusses on different aspects of discourse in scientific articles, to annotate a corpus of three full-text papers, and compare the results. One scheme seeks to identify the core components of scientific investigations at the sentence level, a second annotates meta-knowledge pertaining to bio-events and a third considers how epistemic knowledge is conveyed at the clause level. We present our analysis of the comparison, and a discussion of the contributions of each scheme.

[1]  K. Bretonnel Cohen,et al.  The structural and content aspects of abstracts versus bodies of full text journal articles are different , 2010, BMC Bioinformatics.

[2]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[3]  János Csirik,et al.  The CoNLL-2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text , 2010, CoNLL Shared Task.

[4]  Sophia Ananiadou,et al.  Enriching a biomedical event corpus with meta-knowledge annotation , 2011, BMC Bioinformatics.

[5]  Maria Liakata,et al.  Identifying the Information Structure of Scientific Abstracts: An Investigation of Three Different Schemes , 2010, BioNLP@ACL.

[6]  Jun'ichi Tsujii,et al.  Corpus annotation for mining biomedical events from literature , 2008, BMC Bioinformatics.

[7]  Ross D King,et al.  An ontology of scientific experiments , 2006, Journal of The Royal Society Interface.

[8]  Jari Björne,et al.  Extracting Complex Biological Events with Rich Graph-Based Feature Sets , 2009, BioNLP@HLT-NAACL.

[9]  Hagit Shatkay,et al.  Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users , 2008, Bioinform..

[10]  Sophia Ananiadou,et al.  Construction of an annotated corpus to support biomedical information extraction , 2009, BMC Bioinformatics.

[11]  Jean Carletta,et al.  An annotation scheme for discourse-level argumentation in research articles , 1999, EACL.

[12]  Michael Gamon,et al.  MSR-NLP Entry in BioNLP Shared Task 2011 , 2011, BioNLP@ACL.

[13]  Simone Teufel,et al.  Towards Domain-Independent Argumentative Zoning: Evidence from Chemistry and Computational Linguistics , 2009, EMNLP.

[14]  Naoaki Okazaki,et al.  Identifying Sections in Scientific Abstracts using Conditional Random Fields , 2008, IJCNLP.

[15]  Anita de Waard,et al.  Epistemic Modality and Knowledge Attribution in Scientific Discourse: A Taxonomy of Types and Overview of Features , 2012, ACL 2012.

[16]  János Csirik,et al.  The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes , 2008, BMC Bioinformatics.

[17]  Sophia Ananiadou,et al.  Identification of Manner in Bio-Events , 2012, LREC.

[18]  Halil Kilicoglu,et al.  Recognizing speculative language in biomedical research articles: a linguistically motivated perspective , 2008, BMC Bioinformatics.

[19]  Ágnes Sándor,et al.  Modeling metadiscourse conveying the author's rhetorical strategy in biomedical research abstracts , 2007 .

[20]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[21]  K. Bretonnel Cohen,et al.  Frontiers of biomedical text mining: current progress , 2007, Briefings Bioinform..

[22]  Sampo Pyysalo,et al.  Overview of the ID, EPI and REL tasks of BioNLP Shared Task 2011 , 2012, BMC Bioinformatics.

[23]  Nigel Collier,et al.  Zone analysis in biology articles as a basis for information extraction , 2006, Int. J. Medical Informatics.

[24]  Simone Teufel Towards Discipline-Independent Argumentative Zoning : Evidence from Chemistry and Computational Linguistics , 2009 .

[25]  Sophia Ananiadou,et al.  Boosting automatic event extraction from the literature using domain adaptation and coreference resolution , 2012, Bioinform..

[26]  Dietrich Rebholz-Schuhmann,et al.  Automatic recognition of conceptualization zones in scientific articles and two life science applications , 2012, Bioinform..

[27]  Anita de Waard,et al.  Epistemic Modality and Knowledge Attribution in Scientific Discourse: A Taxonomy of Types and Overview of Features , 2012 .

[28]  Simone Teufel,et al.  The Structure of Scientific Articles - Applications to Citation Indexing and Summarization , 2010, CSLI Studies in Computational Linguistics.

[29]  Sampo Pyysalo,et al.  EXTRACTING BIO‐MOLECULAR EVENTS FROM LITERATURE—THE BIONLP’09 SHARED TASK , 2011, Comput. Intell..

[30]  Padmini Srinivasan,et al.  Categorization of Sentence Types in Medical Abstracts , 2003, AMIA.

[31]  Roser Morante,et al.  Modality and Negation: An Introduction to the Special Issue , 2012, CL.

[32]  Junichi Tsujii,et al.  Event extraction for systems biology by text mining the literature. , 2010, Trends in biotechnology.

[33]  K. Bretonnel Cohen,et al.  Getting Started in Text Mining , 2008, PLoS Comput. Biol..

[34]  Maria Liakata,et al.  A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment , 2011, BMC Bioinformatics.

[35]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[36]  Maria Liakata,et al.  An ontology methodology and CISP-the proposed Core Information about Scientific Papers , 2007 .

[37]  Dietrich Rebholz-Schuhmann,et al.  Using argumentation to extract key sentences from biomedical abstracts , 2007, Int. J. Medical Informatics.

[38]  Maria Liakata,et al.  The ART Corpus , 2009 .

[39]  Sophia Ananiadou,et al.  Extracting semantically enriched events from biomedical literature , 2012, BMC Bioinformatics.

[40]  Sophia Ananiadou,et al.  Text mining and its potential applications in systems biology. , 2006, Trends in biotechnology.

[41]  Simone Teufel,et al.  Corpora for the Conceptualisation and Zoning of Scientific Papers , 2010, LREC.

[42]  Sophia Ananiadou,et al.  Text Mining for Biology And Biomedicine , 2005 .

[43]  Jun'ichi Tsujii,et al.  Event Extraction with Complex Event Classification Using Rich Features , 2010, J. Bioinform. Comput. Biol..

[44]  Jörg Tiedemann,et al.  Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12) , 2012 .

[45]  Catherine Blake,et al.  Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles , 2010, J. Biomed. Informatics.

[46]  Padmini Srinivasan,et al.  The Language of Bioscience: Facts, Speculations, and Statements In Between , 2004, HLT-NAACL 2004.

[47]  K. Hyland,et al.  Writing Without Conviction? Hedging in Science Research Articles , 1996 .

[48]  Paul Buitelaar,et al.  Identifying the Epistemic Value of Discourse Segments in Biology Texts (project abstract) , 2009, IWCS.

[49]  Jun'ichi Tsujii,et al.  Semantic Retrieval for the Accurate Identification of Relational Concepts in Massive Textbases , 2006, ACL.

[50]  Sophia Ananiadou,et al.  Meta-Knowledge Annotation of Bio-Events , 2010, LREC.

[51]  K. Bretonnel Cohen,et al.  U-Compare: A modular NLP workflow construction and evaluation system , 2011, IBM J. Res. Dev..