Adapting SVM for data sparseness and imbalance: a case study in information extraction

Support Vector Machines (SVM) have been used successfully in many Natural Language Processing (NLP) tasks. The novel contribution of this paper is in investigating two techniques for making SVM more suitable for language learning tasks. Firstly, we propose an SVM with uneven margins (SVMUM) model to deal with the problem of imbalanced training data. Secondly, SVM active learning is employed in order to alleviate the difficulty in obtaining labelled training data. The algorithms are presented and evaluated on several Information Extraction (IE) tasks, where they achieved better performance than the standard SVM and the SVM with passive learning, respectively. Moreover, by combining SVMUM with the active learning algorithm, we achieve the best reported results on the seminars and jobs corpora, which are benchmark data sets used for evaluation and comparison of machine learning algorithms for IE. In addition, we also evaluate the token based classification framework for IE with three different entity tagging schemes. In comparison to previous methods dealing with the same problems, our methods are both effective and efficient, which are valuable features for real-world applications. Due to the similarity in the formulation of the learning problem for IE and for other NLP tasks, the two techniques are likely to be beneficial in a wide range of applications 1 .

[1]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[2]  Jian Su,et al.  Exploring Various Knowledge in Relation Extraction , 2005, ACL.

[3]  Hideki Isozaki,et al.  Efficient Support Vector Classifiers for Named Entity Recognition , 2002, COLING.

[4]  William M. Pottenger,et al.  A semi-supervised active learning algorithm for information extraction from textual data: Research Articles , 2005 .

[5]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[6]  Nello Cristianini,et al.  Margin Distribution Bounds on Generalization , 1999, EuroCOLT.

[7]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[8]  Jason Weston,et al.  Vicinal Risk Minimization , 2000, NIPS.

[9]  Katharina Morik,et al.  Combining Statistical Learning with a Knowledge-Based Approach - A Case Study in Intensive Care Monitoring , 1999, ICML.

[10]  Rosie Jones,et al.  Learning to Extract Entities from Labeled and Unlabeled Text , 2005 .

[11]  Fabio Ciravegna,et al.  (LP) 2 , an Adaptive Algorithm for Information Extraction from Web-related Texts , 2001 .

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Kalina Bontcheva,et al.  SVM Based Learning System for Information Extraction , 2004, Deterministic and Statistical Methods in Machine Learning.

[14]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[15]  Claudio Gentile,et al.  Kernel Methods for Document Filtering , 2002, TREC.

[16]  Yuji Matsumoto,et al.  Unknown Word Guessing and Part-of-Speech Tagging Using Support Vector Machines , 2001, NLPRS.

[17]  BontchevaKalina,et al.  Adapting svm for data sparseness and imbalance , 2009 .

[18]  Daniel Jurafsky,et al.  Semantic Role Labeling by Tagging Syntactic Chunks , 2004, CoNLL.

[19]  Andrew McCallum,et al.  Information Extraction with HMMs and Shrinkage , 1999 .

[20]  Rebecca Hwa,et al.  Sample Selection for Statistical Parsing , 2004, CL.

[21]  Aidan Finn,et al.  Active Learning Selection Strategies for Information Extraction , 2003 .

[22]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[23]  Neil D. Lawrence,et al.  Deterministic and Statistical Methods in Machine Learning, First International Workshop, Sheffield, UK, September 7-10, 2004, Revised Lectures , 2005, Deterministic and Statistical Methods in Machine Learning.

[24]  David Yarowsky,et al.  Rule Writing or Annotation: Cost-efficient Resource Usage for Base Noun Phrase Chunking , 2000, ACL.

[25]  John Shawe-Taylor,et al.  The SVM With Uneven Margins and Chinese Document Categorization , 2003, PACLIC.

[26]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[27]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[29]  Hwee Tou Ng,et al.  Named Entity Recognition: A Maximum Entropy Approach Using Global Information , 2002, COLING.

[30]  M. Cali,et al.  Relational learning techniques for natural language information extraction , 1998 .

[31]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[32]  Yiming Yang,et al.  A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[33]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[34]  Jes Us Gim Enez And Llu Fast and Accurate Part{of{speech Tagging: the Svm Approach Revisited , 2003 .

[35]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[36]  Walter Daelemans,et al.  Information Extraction via Double Classification , 2003 .

[37]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[38]  Lynette Hirschman,et al.  Mixed-Initiative Development of Language Processing Systems , 1997, ANLP.

[39]  Yuji Matsumoto,et al.  Japanese Dependency Structure Analysis Based on Support Vector Machines , 2000, EMNLP.

[40]  Yuji Matsumoto,et al.  Use of Support Vector Learning for Chunk Identification , 2000, CoNLL/LLL.

[41]  Branimir Boguraev,et al.  Natural Language Engineering , 1995 .

[42]  Christine D. Piatko,et al.  Named Entity Recognition using Hundreds of Thousands of Features , 2003, CoNLL.

[43]  Hwee Tou Ng,et al.  Supervised Word Sense Disambiguation with Support Vector Machines and multiple knowledge sources , 2004, SENSEVAL@ACL.

[44]  Alexiei Dingli,et al.  User-System Cooperation in Document Annotation Based on Information Extraction , 2002, EKAW.

[45]  Nello Cristianini,et al.  Query Learning with Large Margin Classi ersColin , 2000 .

[46]  Manabu Sassano,et al.  An Empirical Study of Active Learning with Support Vector Machines for Japanese Word Segmentation , 2002, ACL.

[47]  William M. Pottenger,et al.  A semi-supervised active learning algorithm for information extraction from textual data , 2005, J. Assoc. Inf. Sci. Technol..

[48]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[49]  Dayne Freitag,et al.  Machine Learning for Information Extraction in Informal Domains , 2000, Machine Learning.

[50]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[51]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[52]  Xavier Carreras,et al.  Learning a Perceptron-Based Named Entity Chunker via Online Recognition Feedback , 2003, CoNLL.

[53]  Dan Roth,et al.  On Kernel Methods for Relational Learning , 2003, ICML.

[54]  Hwee Tou Ng,et al.  A maximum entropy approach to information extraction from semi-structured and free text , 2002, AAAI/IAAI.

[55]  Hitoshi Isahara,et al.  Japanese Dependency Structure Analysis Based on Maximum Entropy Models , 1999, EACL.

[56]  Gökhan Tür,et al.  Active learning for spoken language understanding , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[57]  Miguel Figueroa,et al.  Competitive learning with floating-gate circuits , 2002, IEEE Trans. Neural Networks.

[58]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[59]  Dan Roth,et al.  Relational Learning via Propositional Algorithms: An Information Extraction Case Study , 2001, IJCAI.

[60]  Dayne Freitag,et al.  Boosted Wrapper Induction , 2000, AAAI/IAAI.

[61]  Jason Weston,et al.  Large Scale Transductive SVMs , 2006, J. Mach. Learn. Res..

[62]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[63]  Kalina Bontcheva,et al.  Using Uneven Margins SVM and Perceptron for Information Extraction , 2005, CoNLL.