A Hybrid Approach of Pattern Extraction and Semi-supervised Learning for Vietnamese Named Entity Recognition

Requiring a large hand-annotated corpus in supervised learning of contemporary Vietnamese Named Entity Recognition researches is challenging. We therefore propose a hybrid approach of pattern extraction and semi-supervised learning. Applied rule-based method helps generating patterns automatically. Part-of-speech tagger, lexical diversity and chunking are explored to define rules in pattern extractions which are used for identifying potential named entities. Semi-supervised learning trains a small amount of seed named entities to categorize named entities in extracted patterns. In experiments, our approach shows good increasing the system accuracy with others in Vietnamese.

[1]  Nigel Collier,et al.  Named Entity Recognition in Vietnamese documents , 2007 .

[2]  Nigel Collier,et al.  Use of Support Vector Machines in Extended Named Entity Recognition , 2002, CoNLL.

[3]  Kalina Bontcheva,et al.  Towards a semantic extraction of named entities , 2003 .

[4]  Son Bao Pham,et al.  Named Entity Recognition for Vietnamese , 2010, ACIIDS.

[5]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[6]  Ngoc Thanh Nguyen,et al.  Intelligent Information and Database Systems , 2014, Lecture Notes in Computer Science.

[7]  Thuy Thanh Nguyen,et al.  Combining Proper Name-Coreference with Conditional Random Fields for Semi-supervised Named Entity Recognition in Vietnamese Text , 2011, PAKDD.

[8]  Tru H. Cao,et al.  VN-KIM IE: Automatic Extraction of Vietnamese Named-Entities on the Web , 2007, New Generation Computing.

[9]  Eugénio C. Oliveira,et al.  A Bootstrapping Approach for Training a NER with Conditional Random Fields , 2011, EPIA.

[10]  A. Campbell,et al.  Progress in Artificial Intelligence , 1995, Lecture Notes in Computer Science.

[11]  Siddharth Patwardhan,et al.  Effective Information Extraction with Semantic Affinity Patterns and Relevant Regions , 2007, EMNLP.

[12]  Ion Muslea,et al.  Extraction Patterns for Information Extraction Tasks: A Survey , 1999 .

[13]  Rebecca Hwa,et al.  Syntax-based Semi-Supervised Named Entity Tagging , 2005, ACL.

[14]  Cheng Niu,et al.  A Bootstrapping Approach to Named Entity Classification Using Successive Learners , 2003, ACL.

[15]  Tianshun Yao,et al.  Using Seed Words to Learn to Categorize Chinese Text , 2004, EsTAL.

[16]  Shih-Hung Wu,et al.  Mencius: A Chinese Named Entity Recognizer Using the Maximum Entropy-based Hybrid Model , 2004, Int. J. Comput. Linguistics Chin. Lang. Process..

[17]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[18]  Hideki Isozaki,et al.  Japanese Named Entity Recognition based on a Simple Rule Generator and Decision Tree Learning , 2001, ACL.

[19]  Ellen Riloff,et al.  A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts , 2002, EMNLP.

[20]  Sriharsha Veeramachaneni,et al.  A Simple Semi-supervised Algorithm For Named Entity Recognition , 2009, HLT-NAACL 2009.