Named entity recognition in Vietnamese using classifier voting

Named entity recognition (NER) is one of the fundamental tasks in natural-language processing (NLP). Though the combination of different classifiers has been widely applied in several well-studied languages, this is the first time this method has been applied to Vietnamese. In this article, we describe how voting techniques can improve the performance of Vietnamese NER. By combining several state-of-the-art machine-learning algorithms using voting strategies, our final result outperforms individual algorithms and gained an F-measure of 89.12. A detailed discussion about the challenges of NER in Vietnamese is also presented.

[1]  Yu Song,et al.  POSBIOTM-NER : A Machine Learning Approach for Bio-Named Entity Recognition , 2004 .

[2]  Yuan Dong,et al.  France Telecom R&D Beijing Word Segmenter for Sighan Bakeoff 2006 , 2006, SIGHAN@COLING/ACL.

[3]  Hitoshi Isahara,et al.  Chinese Named Entity Recognition with Conditional Random Fields , 2006, SIGHAN@COLING/ACL.

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  Yuji Matsumoto,et al.  Protein Name Tagging for Biomedical Annotation in Text , 2003, BioNLP@ACL.

[6]  Marine Carpuat,et al.  A Stacked, Voted, Stacked Model for Named Entity Recognition , 2003, CoNLL.

[7]  Tong Zhang,et al.  A Robust Risk Minimization based Named Entity Recognition System , 2003, CoNLL.

[8]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[9]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[10]  Hae-Chang Rim,et al.  Two-Phase Biomedical NE Recognition based on SVMs , 2003, BioNLP@ACL.

[11]  Dan Roth,et al.  A Learning Approach to Shallow Parsing , 1999, EMNLP.

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Nigel Collier,et al.  Named Entity Recognition in Vietnamese documents , 2007 .

[14]  Hong Shen,et al.  Voting Between Multiple Data Representations for Text Chunking , 2005, Canadian AI.

[15]  Gary Geunbae Lee,et al.  SVM-Based Biological Named Entity Recognition Using Minimum Edit-Distance Feature Boosted by Virtual Examples , 2004, IJCNLP.

[16]  Nigel Collier,et al.  Use of Support Vector Machines in Extended Named Entity Recognition , 2002, CoNLL.

[17]  Huanye Sheng,et al.  A Hybrid Approach for Chinese Named Entity Recognition , 2002, Discovery Science.

[18]  Erik F. Tjong Kim Sang,et al.  Representing Text Chunks , 1999, EACL.

[19]  Erik F. Tjong Kim Sang,et al.  Noun Phrase Recognition by System Combination , 2000, ANLP.

[20]  Erik F. Tjong Kim Sang,et al.  Memory-Based Named Entity Recognition , 2002, CoNLL.

[21]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[22]  N. H. Beebe A Complete Bibliography of ACM Transactions on Asian Language Information Processing , 2007 .

[23]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[24]  Erik F. Tjong Kim Sang,et al.  Text Chunking by System Combination , 2000, CoNLL/LLL.

[25]  M. E. Maron,et al.  Automatic Indexing: An Experimental Inquiry , 1961, JACM.

[26]  John Meurig Thomas,et al.  Unpredictability and chance in scientific progress , 2007 .