Predicting speculation: a simple disambiguation approach to hedge detection in biomedical literature

BackgroundThis paper presents a novel approach to the problem of hedge detection, which involves identifying so-called hedge cues for labeling sentences as certain or uncertain. This is the classification problem for Task 1 of the CoNLL-2010 Shared Task, which focuses on hedging in the biomedical domain. We here propose to view hedge detection as a simple disambiguation problem, restricted to words that have previously been observed as hedge cues. As the feature space for the classifier is still very large, we also perform experiments with dimensionality reduction using the method of random indexing.ResultsThe SVM-based classifiers developed in this paper achieves the best published results so far for sentence-level uncertainty prediction on the CoNLL-2010 Shared Task test data. We also show that the technique of random indexing can be successfully applied for reducing the dimensionality of the original feature space by several orders of magnitude, without sacrificing classifier performance.ConclusionsThis paper introduces a simplified approach to detecting speculation or uncertainty in text, focusing on the biomedical domain. Evaluated at the sentence-level, our SVM-based classifiers achieve the best published results so far. We also show that the feature space can be aggressively compressed using random indexing while still maintaining comparable classifier performance.

[1]  Jacek M. Zurada,et al.  Computational Intelligence: Imitating Life , 1994 .

[2]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[3]  Lyle Ungar KDD-2006 : proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 20-23, 2006, Philadelphia, PA, USA , 2006 .

[4]  Padraic Monaghan,et al.  Proceedings of the 23rd annual conference of the cognitive science society , 2001 .

[5]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[6]  Andreas Vlachos,et al.  Detecting Speculative Language Using Syntactic Dependencies and Logistic Regression , 2010, CoNLL Shared Task.

[7]  K. Bretonnel Cohen,et al.  Proceedings of the BioNLP 2009 Workshop , 2009 .

[8]  Magnus Sahlgren,et al.  An Introduction to Random Indexing , 2005 .

[9]  Sergei Nirenburg Proceedings of the sixth conference on Applied natural language processing , 2000 .

[10]  Roser Morante,et al.  Learning the Scope of Hedge Cues in Biomedical Texts , 2009, BioNLP@HLT-NAACL.

[11]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[12]  P MarcusMitchell,et al.  Building a large annotated corpus of English , 1993 .

[13]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[14]  Xiaolong Wang,et al.  A Cascade Method for Detecting Hedges and their Scope in Natural Language Text , 2010, CoNLL Shared Task.

[15]  Kenneth Ward Church,et al.  Very sparse random projections , 2006, KDD '06.

[16]  János Csirik,et al.  The BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts , 2008, BioNLP.

[17]  Anders Holst,et al.  Random indexing of text samples for latent semantic analysis , 2000 .

[18]  Christopher K. I. Williams,et al.  Unsupervised Learning of Multiple Aspects of Moving Objects from Video , 2005, Panhellenic Conference on Informatics.

[19]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[20]  Vladimir Naumovich Vapni The Nature of Statistical Learning Theory , 1995 .

[22]  Magnus Sahlgren,et al.  Automatic bilingual lexicon acquisition using random indexing of parallel corpora , 2005, Nat. Lang. Eng..

[23]  Lilja Øvrelid,et al.  Cross-framework parser stacking for data-driven dependency parsing , 2009, TAL.

[24]  Stephan Oepen,et al.  Resolving Speculation: MaxEnt Cue Classification and Dependency-Based Scope Rules , 2010, CoNLL Shared Task.

[25]  Dan Flickinger,et al.  On building a more effcient grammar by exploiting types , 2000, Natural Language Engineering.

[26]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[27]  Stephan Oepen,et al.  Syntactic Scope Resolution in Uncertainty Analysis , 2010, COLING.

[28]  R. Catrambone,et al.  Proceedings of the 32nd Annual Conference of the Cognitive Science Society , 2010 .

[29]  Mojgan Seraji,et al.  A Statistical Part-of-Speech Tagger for Persian , 2011, NODALIDA.

[30]  Sophia Ananiadou,et al.  Developing a Robust Part-of-Speech Tagger for Biomedical Text , 2005, Panhellenic Conference on Informatics.

[31]  János Csirik,et al.  The CoNLL-2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text , 2010, CoNLL Shared Task.

[32]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..