Drug-Drug Interaction Extraction from Biomedical Texts with SVM and RLS Classifiers

We introduce a system developed to extract drug-drug in- teractions (DDI) for drug mention pairs found in biomedical texts. This system was developed for the DDI Extraction First Challenge Task 2011 and is based on our publicly available Turku Event Extraction System, which we adapt for the domain of drug-drug interactions. This system relies heavily on deep syntactic parsing to build a representation of the relations between drug mentions. In developing the DDI extraction sys- tem, we evaluate the suitability of both text-based and database derived features for DDI detection. For machine learning, we test both support vector machine (SVM) and regularized least-squares (RLS) classifiers, with detailed experiments for determining the optimal parameters and training approach. Our system achieves a performance of 62.99% F-score on the DDI Extraction 2011 task.

[1]  T. Poggio,et al.  Chapter 7 Regularized Least-Squares Classification , 2003 .

[2]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[3]  Eugene Charniak,et al.  Any Domain Parsing: Automatic Domain Adaptation for Natural Language Parsing , 2010 .

[4]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[5]  César de Pablo-Sánchez,et al.  Using a shallow linguistic kernel for drug-drug interaction extraction , 2011, J. Biomed. Informatics.

[6]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[7]  Jari Björne,et al.  Generalizing Biomedical Event Extraction , 2011, BioNLP@ACL.

[8]  Jing Peng,et al.  SVM vs regularized least squares classification , 2004, ICPR 2004.

[9]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[10]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[11]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[12]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[13]  Jari Björne,et al.  Comparative analysis of five protein-protein interaction corpora , 2008, BMC Bioinformatics.

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[16]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[17]  Paloma Martínez,et al.  DDIExtractor: A Web-Based Java Tool for Extracting Drug-Drug Interactions from Biomedical Texts , 2011, NLDB.