Exploiting Multi-Features to Detect Hedges and their Scope in Biomedical Texts

In this paper, we present a machine learning approach that detects hedge cues and their scope in biomedical texts. Identifying hedged information in texts is a kind of semantic filtering of texts and it is important since it could extract speculative information from factual information. In order to deal with the semantic analysis problem, various evidential features are proposed and integrated through a Conditional Random Fields (CRFs) model. Hedge cues that appear in the training dataset are regarded as keywords and employed as an important feature in hedge cue identification system. For the scope finding, we construct a CRF-based system and a syntactic pattern-based system, and compare their performances. Experiments using test data from CoNLL-2010 shared task show that our proposed method is robust. F-score of the biological hedge detection task and scope finding task achieves 86.32% and 54.18% in in-domain evaluations respectively.

[1]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[2]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[3]  Ben Medlock,et al.  Exploring hedge identification in biomedical literature , 2008, J. Biomed. Informatics.

[4]  György Szarvas,et al.  Hedge Classification in Biomedical Texts with a Weakly Supervised Selection of Keywords , 2008, ACL.

[5]  Peter L. Elkin,et al.  A controlled trial of automated classification of negation from clinical notes , 2005, BMC Medical Informatics Decis. Mak..

[6]  János Csirik,et al.  The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes , 2008, BMC Bioinformatics.

[7]  Michael Strube,et al.  Finding Hedges by Chasing Weasels: Hedge Detection Using Wikipedia Tags and Shallow Linguistic Features , 2009, ACL.

[8]  Sophia Ananiadou,et al.  Developing a Robust Part-of-Speech Tagger for Biomedical Text , 2005, Panhellenic Conference on Informatics.

[9]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[10]  Padmini Srinivasan,et al.  The Language of Bioscience: Facts, Speculations, and Statements In Between , 2004, HLT-NAACL 2004.

[11]  Ted Briscoe,et al.  Weakly Supervised Learning for Hedge Classification in Scientific Literature , 2007, ACL.

[12]  Roser Morante,et al.  Learning the Scope of Hedge Cues in Biomedical Texts , 2009, BioNLP@HLT-NAACL.

[13]  János Csirik,et al.  The CoNLL-2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text , 2010, CoNLL Shared Task.

[14]  Ellen Riloff,et al.  Learning subjective nouns using extraction pattern bootstrapping , 2003, CoNLL.

[15]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.