Scalable biomedical Named Entity Recognition: investigation of a database-supported SVM approach

This paper explores scalability issues associated with the Named Entity Recognition problem in the biomedical publications domain using Support Vector Machines. The performance results using existing binary and multi-class SVMs with increasing training data are compared to results obtained using our new implementations. Our approach eliminates prior language or domain-specific knowledge and achieves good out-of-the-box accuracy measures comparable to those obtained using more complex approaches. The training time of multi-class SVMs is reduced by several orders of magnitude, which would make support vector machines a more viable and practical solution for real-world problems with large datasets.

[1]  Claudio Giuliano,et al.  Simple Information Extraction (SIE) , 2005 .

[2]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[3]  Shigeo Abe,et al.  Support Vector Machines for Pattern Classification (Advances in Pattern Recognition) , 2005 .

[4]  Nigel Collier,et al.  Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[5]  Marcos M. Campos,et al.  SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines , 2005, VLDB.

[6]  Mona Soliman Habib,et al.  Addressing Scalability Issues of Named Entity Recognition Using Multi-Class Support Vector Machines , 2008 .

[7]  Shigeo Abe Support Vector Machines for Pattern Classification , 2010, Advances in Pattern Recognition.

[8]  Hae-Chang Rim,et al.  Biomedical named entity recognition using two-phase model based on SVMs , 2004, J. Biomed. Informatics.

[9]  Gary Geunbae Lee,et al.  POSBIOTM-NER in the Shared Task of BioNLP/NLPBA2004 , 2004, NLPBA/BioNLP.

[10]  J. Kalita,et al.  Language and Domain-Independent Named Entity Recognition : Experiment using SVM and High-Dimensional Features , 2007 .

[11]  Alessandro Cimatti,et al.  Istituto per La Ricerca Scientifica E Tecnologica , 1996 .

[13]  Zhou GuoDong,et al.  Recognizing names in biomedical texts using hidden Markov model and SVM plus sigmoid , 2004 .

[14]  Hae-Chang Rim,et al.  Incorporating Lexical Knowledge into Biomedical NE Recognition , 2004, NLPBA/BioNLP.

[15]  Kristin P. Bennett,et al.  Support vector machines: hype or hallelujah? , 2000, SKDD.

[16]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[17]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[18]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .

[19]  Guodong Zhou,et al.  Recognizing Names in Biomedical Texts using Hidden Markov Model and SVM plus Sigmoid , 2004, NLPBA/BioNLP.

[20]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[21]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[22]  Venu Govindaraju,et al.  Half-Against-Half Multi-class Support Vector Machines , 2005, Multiple Classifier Systems.

[23]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[24]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[25]  Su Jian,et al.  Exploring Deep Knowledge Resources in Biomedical Name Recognition , 2004, NLPBA/BioNLP.

[26]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[27]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[28]  Marc Rössler,et al.  Adapting an NER-System for German to the Biomedical Domain , 2004, NLPBA/BioNLP.

[29]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[30]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[31]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[32]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[33]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[34]  Stefan Rüping Support Vector Machines in Relational Databases , 2002, SVM.