Using SVMs with the Command Relation features to identify negated events in biomedical literature

In this paper we explore the identification of negated molecular events (e.g. protein binding, gene expressions, regulation, etc.) in biomedical research abstracts. We construe the problem as a classification task and apply a machine learning (ML) approach that uses lexical, syntactic, and semantic features associated with sentences that represent events. Lexical features include negation cues, whereas syntactic features are engineered from constituency parse trees and the command relation between constituents. Semantic features include event type and participants. We also consider a rule-based approach that uses only the command relation. On a test dataset, the ML approach showed significantly better results (51% F-measure) compared to the command-based rules (35--42% F-measure). Training a separate classifier for each event class proved to be useful, as the micro-averaged F-score improved to 63% (with 88% precision), demonstrating the potential of task-specific ML approaches to negation detection.

[1]  Fang Liu,et al.  Concept Negation in Free Text Components of Vaccine Safety Reports , 2006, AMIA.

[2]  Prakash M. Nadkarni,et al.  Research Paper: Use of General-purpose Negation Detection to Augment Concept Indexing of Medical Documents: A Quantitative Study Using the UMLS , 2001, J. Am. Medical Informatics Assoc..

[3]  Jon D. Patrick,et al.  Automatic Mapping Clinical Notes to Medical Terminologies , 2006, ALTA.

[4]  Yang Huang,et al.  A novel hybrid approach to automated negation detection in clinical radiology reports. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[5]  Joel D. Martin,et al.  PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine , 2003, BMC Bioinformatics.

[6]  Jeyakumar Natarajan,et al.  Text mining of full-text journal articles combined with gene expression analysis reveals a relationship between sphingosine-1-phosphate and invasiveness of a glioblastoma cell line , 2006, BMC Bioinformatics.

[7]  Yvan Saeys,et al.  Extracting protein-protein interactions from text using rich feature vectors and feature selection , 2008, SMBM 2008.

[9]  Daniel M. Bikel,et al.  A Distributional Analysis of a Lexicalized Statistical Parsing Model , 2004, EMNLP.

[10]  James D. McCawley,et al.  Everything That Linguists Have Always Wanted to Know About Logic , 1980, Stud Logica.

[11]  吴道平 Everything That Linguists Have Always Wanted to Know About Logic But Were Ashamed to Ask , 1985 .

[12]  Timothy Baldwin,et al.  Biomedical Event Annotation with CRFs and Precision Grammars , 2009, BioNLP@HLT-NAACL.

[13]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[14]  Halil Kilicoglu,et al.  Syntactic Dependency Based Heuristics for Biological Event Extraction , 2009, BioNLP@HLT-NAACL.

[15]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[16]  Goran Nenadic,et al.  Biomedical Event Detection using Rules, Conditional Random Fields and Parse Tree Distances , 2009, BioNLP@HLT-NAACL.

[17]  Jun'ichi Tsujii,et al.  Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data , 2005, HLT.

[18]  Roser Morante,et al.  A Metalearning Approach to Processing the Scope of Negation , 2009, CoNLL.

[19]  Jeffery J. Mondak The Cambridge Encyclopedia of the Language Sciences , 2010 .

[20]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[21]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[22]  Ulf Leser,et al.  Molecular event extraction from Link Grammar parse trees , 2009, BioNLP@HLT-NAACL.