Speculation and Negation: Rules, Rankers, and the Role of Syntax

This article explores a combination of deep and shallow approaches to the problem of resolving the scope of speculation and negation within a sentence, specifically in the domain of biomedical research literature. The first part of the article focuses on speculation. After first showing how speculation cues can be accurately identified using a very simple classifier informed only by local lexical context, we go on to explore two different syntactic approaches to resolving the in-sentence scopes of these cues. Whereas one uses manually crafted rules operating over dependency structures, the other automatically learns a discriminative ranking function over nodes in constituent trees. We provide an in-depth error analysis and discussion of various linguistic properties characterizing the problem, and show that although both approaches perform well in isolation, even better results can be obtained by combining them, yielding the best published results to date on the CoNLL-2010 Shared Task data. The last part of the article describes how our speculation system is ported to also resolve the scope of negation. With only modest modifications to the initial design, the system obtains state-of-the-art results on this task also.

[1]  Ted Briscoe,et al.  Combining Manual Rules and Supervised Learning for Hedge Cue and Scope Detection , 2010, CoNLL Shared Task.

[2]  Miriam Butt,et al.  The Parallel Grammar Project , 2002, COLING 2002.

[3]  Karo Moilanen,et al.  Sentiment Composition , 2007 .

[4]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[5]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[6]  Roser Morante,et al.  Learning the Scope of Hedge Cues in Biomedical Texts , 2009, BioNLP@HLT-NAACL.

[7]  János Csirik,et al.  The CoNLL-2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text , 2010, CoNLL Shared Task.

[8]  Mark Johnson,et al.  Estimators for Stochastic “Unification-Based” Grammars , 1999, ACL.

[9]  Dan Flickinger,et al.  On building a more effcient grammar by exploiting types , 2000, Natural Language Engineering.

[10]  Stephan Oepen,et al.  Efficiency in Unification-Based N-Best Parsing , 2007, Trends in Parsing Technology.

[11]  Stephan Oepen,et al.  Stochastic HPSG Parse Disambiguation using the Redwoods Corpus , 2005 .

[12]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[13]  Padmini Srinivasan,et al.  The Language of Bioscience: Facts, Speculations, and Statements In Between , 2004, HLT-NAACL 2004.

[14]  Isaac G. Councill,et al.  What's great and what's not: learning to classify the scope of negation for improved sentiment analysis , 2010, NeSp-NLP@ACL.

[15]  Yi Zhang,et al.  Cross-Domain Dependency Parsing Using a Deep Linguistic Grammar , 2009, ACL/IJCNLP.

[16]  Roser Morante,et al.  Memory-Based Resolution of In-Sentence Scopes of Hedge Cues , 2010, CoNLL Shared Task.

[17]  János Csirik,et al.  The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes , 2008, BMC Bioinformatics.

[18]  Andreas Vlachos,et al.  Detecting Speculative Language Using Syntactic Dependencies and Logistic Regression , 2010, CoNLL Shared Task.

[19]  Timothy Baldwin,et al.  Treeblazing: Using External Treebanks to Filter Parse Forests for Parse Selection and Treebanking , 2011, IJCNLP.

[20]  Lilja Øvrelid,et al.  Cross-framework parser stacking for data-driven dependency parsing , 2009, TAL.

[21]  P MarcusMitchell,et al.  Building a large annotated corpus of English , 1993 .

[22]  Stephan Oepen,et al.  Collaborative language engineering : a case study in efficient grammar-based processing , 2002 .

[23]  Eric P. Xing,et al.  Stacking Dependency Parsers , 2008, EMNLP.

[25]  Stephan Oepen,et al.  Resolving Speculation: MaxEnt Cue Classification and Dependency-Based Scope Rules , 2010, CoNLL Shared Task.

[26]  Guodong Zhou,et al.  A Unified Framework for Scope Learning via Simplified Shallow Semantic Parsing , 2010, EMNLP.

[27]  Robert Malouf,et al.  Wide Coverage Parsing with Stochastic Attribute Value Grammars , 2004 .

[28]  Nigel Collier,et al.  The GENIA project: corpus-based knowledge acquisition and information extraction from genome research papers , 1999, EACL.

[29]  Sophia Ananiadou,et al.  Developing a Robust Part-of-Speech Tagger for Biomedical Text , 2005, Panhellenic Conference on Informatics.

[30]  Daniel Gildea,et al.  Corpus Variation and Parser Performance , 2001, EMNLP.

[31]  Stephan Oepen,et al.  Syntactic Scope Resolution in Uncertainty Analysis , 2010, COLING.

[32]  Richard Johansson,et al.  Extended Constituent-to-Dependency Conversion for English , 2007, NODALIDA.

[33]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[34]  Halil Kilicoglu,et al.  A High-Precision Approach to Detecting Hedges and their Scopes , 2010, CoNLL Shared Task.

[35]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[36]  Roser Morante,et al.  Descriptive Analysis of Negation Cues in Biomedical Texts , 2010, LREC.

[37]  Joakim Nivre,et al.  Integrating Graph-Based and Transition-Based Dependency Parsers , 2008, ACL.

[38]  Ted Briscoe,et al.  Weakly Supervised Learning for Hedge Classification in Scientific Literature , 2007, ACL.

[39]  Xiaolong Wang,et al.  A Cascade Method for Detecting Hedges and their Scope in Natural Language Text , 2010, CoNLL Shared Task.

[40]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[41]  Joakim Nivre,et al.  Transition-based Dependency Parsing with Rich Non-local Features , 2011, ACL.

[42]  Erik Velldal,et al.  Predicting speculation: a simple disambiguation approach to hedge detection in biomedical literature , 2011, J. Biomed. Semant..

[43]  Ivan A. Sag,et al.  Information-based syntax and semantics , 1987 .

[44]  György Szarvas,et al.  Hedge Classification in Biomedical Texts with a Weakly Supervised Selection of Keywords , 2008, ACL.

[45]  Roser Morante,et al.  A Metalearning Approach to Processing the Scope of Negation , 2009, CoNLL.

[46]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[47]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[48]  Mark Johnson,et al.  Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques , 2002, ACL.

[49]  Roser Morante,et al.  Learning the Scope of Negation in Biomedical Texts , 2008, EMNLP.