Vietnamese Noun Phrase Chunking Based on Conditional Random Fields

Noun phrase chunking is an important and useful task in many natural language processing applications. It is studied well for English, however with Vietnamese it is still an open problem. This paper presents a Vietnamese Noun Phrase chunking approach based on Conditional random fields (CRFs) models. We also describe a method to build Vietnamese corpus from a set of hand annotated sentences. For evaluation, we perform several experiments using different feature settings. Outcome results on our corpus show a high performance with the average of recall and precision 82.72% and 82.62% respectively.

[1]  Yoav Goldberg,et al.  Noun Phrase Chunking in Hebrew: Influence of Lexical and Morphological Features , 2006, ACL.

[2]  McCallumAndrew,et al.  Information extraction from research papers using conditional random fields , 2006 .

[3]  Phuong-Thai Nguyen,et al.  Building a Large Syntactically-Annotated Corpus of Vietnamese , 2009, Linguistic Annotation Workshop.

[4]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[5]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[6]  Susumu Horiguchi,et al.  High-Performance Training of Conditional Random Fields for Large-Scale Applications of Labeling Sequence Data , 2007, IEICE Trans. Inf. Syst..

[7]  Mi-Young Kim,et al.  Chunking Using Conditional Random Fields in Korean Texts , 2005, IJCNLP.

[8]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[9]  Steven Abney,et al.  Parsing By Chunks , 1991 .

[10]  Tao Wang,et al.  Semantic Event Detection using Conditional Random Fields , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[11]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[12]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[13]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[14]  Yuji Matsumoto,et al.  Use of Support Vector Learning for Chunk Identification , 2000, CoNLL/LLL.

[15]  Hitoshi Isahara,et al.  Chinese Chunking based on Conditional Random Fields , 2006 .

[16]  Tianshun Yao,et al.  Applying Conditional Random Fields to Chinese Shallow Parsing , 2005, CICLing.

[17]  Jun Zhao,et al.  A Hybrid Approach to Chinese Base Noun Phrase Chunking , 2006, SIGHAN@COLING/ACL.

[18]  Eric Brill,et al.  A corpus-based approach to language learning , 1993 .

[19]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[20]  Xiaolong Wang,et al.  Chinese Chunking Based on Maximum Entropy Markov Models , 2006, Int. J. Comput. Linguistics Chin. Lang. Process..

[21]  R. Robins,et al.  A Vietnamese grammar , 1966 .

[22]  Hitoshi Isahara,et al.  An Empirical Study of Chinese Chunking , 2006, ACL.

[23]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[24]  PhanXuan-Hieu,et al.  High-Performance Training of Conditional Random Fields for Large-Scale Applications of Labeling Sequence Data , 2007 .

[25]  Tong Zhang,et al.  A High-Performance Semi-Supervised Learning Method for Text Chunking , 2005, ACL.

[26]  P. Mannem,et al.  Introduction to the Shallow Parsing Contest for South Asian Languages , 2022 .