Comparative Study of Vietnamese Part-of-Speech Tagging Tools

Vietnamese part-of-speech tagging is one of the most fundamental practices in Vietnamese language processing. Unfortunately, no attempt has been made to empirically compare different Vietnamese part-of-speech tagging software. Therefore, in this paper, the authors experiment upon several Vietnamese part-of-speech tagging software such as VnTagger, RDRPOSTagger (Java Version), JvnTextPro, VNCoreNLP in terms of accuracy, consistency and computational time. In addition, the brief descriptions of the models are discussed in detail. The results help researchers comprehend the models’ strengths and weaknesses. The tools are tested on 4 different data sets of number of sentences and different word types such as date, number, special characters, connected characters, double words, compound words, proper names, etc... The results show that the accuracy of the JvnTextPro tool is high and stable with an accuracy of 80.08 to 97.84%, and the RDPRPOSTagger tool has faster processing time and relatively good accuracy from 88.41 to 96.84%.

[1]  Dai Quoc Nguyen,et al.  RDRPOSTagger: A Ripple Down Rules-based Part-Of-Speech Tagger , 2014, EACL.

[2]  Debbie Richards,et al.  Two decades of Ripple Down Rules research , 2009, The Knowledge Engineering Review.

[3]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4]  Anh-Cuong Le,et al.  An Experimental Investigation of Part-Of-Speech Taggers for Vietnamese , 2016, ArXiv.

[5]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[6]  Mathias Rossignol,et al.  An empirical study of maximum entropy approach for part-of-speech tagging of Vietnamese texts , 2010, JEPTALNRECITAL.

[7]  Phuong-Thai Nguyen,et al.  Building a Large Syntactically-Annotated Corpus of Vietnamese , 2009, Linguistic Annotation Workshop.

[8]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[9]  Chi-Ngon Nguyen,et al.  Conversion of the Vietnammese Grammar into Sign Language Structure using the Example-Based Machine Translation Algorithm , 2018, 2018 International Conference on Advanced Technologies for Communications (ATC).

[10]  Nguyen Chi-Ngon,et al.  Converting the Vietnamese Television News into 3D Sign Language Animations for the Deaf , 2018, INISCOM.

[11]  Minh Le Nguyen,et al.  Sentence Splitting for Vietnamese-English Machine Translation , 2012, 2012 Fourth International Conference on Knowledge and Systems Engineering.

[12]  Dai Quoc Nguyen,et al.  VnCoreNLP: A Vietnamese Natural Language Processing Toolkit , 2018, NAACL.

[13]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[14]  Hinrich Schütze,et al.  Efficient Higher-Order CRFs for Morphological Tagging , 2013, EMNLP.