Learning Named Entity Classifiers Using Support Vector Machines

Traditional methods for named entity classification are based on hand-coded grammars, lists of trigger words and gazetteers. While these methods have acceptable accuracies they present a serious draw-back: if we need a wider coverage of named entities, or a more domain specific coverage we will probably need a lot of human effort to redesign our grammars and revise the lists of trigger words or gazetteers. We present here a method for improving the accuracy of a traditionally-built named entity extractor. Support vector machines are used to train a classifier based on the output of an existing extractor system. Experimental results show that this approach can be a very practical solution, increasing precision by up to 11.94% and recall by up to 27.83% without considerable human effort.

[1]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[2]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[3]  Paola Velardi,et al.  Automatic adaptation of proper noun dictionaries through cooperation of machine learning and probabilistic methods , 2000, SIGIR '00.

[4]  Alexander F. Gelbukh,et al.  Recognition of Named Entities in Spanish Texts , 2004, MICAI.

[5]  Tong Zhang,et al.  A Robust Risk Minimization based Named Entity Recognition System , 2003, CoNLL.

[6]  Xavier Carreras,et al.  A Proposal for Wide-Coverage Spanish Named Entity Recognition , 2002, Proces. del Leng. Natural.

[7]  Jian Su,et al.  Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.

[8]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[9]  Luis Enrique Sucar,et al.  MICAI 2004: Advances in Artificial Intelligence , 2004, Lecture Notes in Computer Science.

[10]  Satoshi Sekine,et al.  Extended Named Entity Hierarchy , 2002, LREC.

[11]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[12]  Xavier Carreras,et al.  A Flexible Distributed Architecture for Natural Language Analyzers , 2002, LREC.

[13]  Radu Florian,et al.  Named Entity Recognition as a House of Cards: Classifier Stacking , 2002, CoNLL.

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[16]  Michael Stonebraker,et al.  The Morgan Kaufmann Series in Data Management Systems , 1999 .