A comparative study of hidden Markov model and conditional random fields on a Yorùba part-of-speech tagging task

Parts-of-speech tagging, the predictive sequential labeling of words in a sentence, given a context, is a challenging problem both because of ambiguity and the infinite nature of natural language vocabulary. Unlike English and most European languages, Yorùba language has no publicly available part-of-speech tagging tool. In this paper, we present the achievements of variants of a bigram hidden Markov model (HMM) as compared to the achievement of a linear-chain conditional random fields (CRF) on a Yorùba part-of-speech tagging task. We have investigated the likely improvements due to using smoothing techniques and morphological affixes on the HMM-based models. For the CRF model, we defined feature functions to capture similar contexts available to the HMM-based models. Both kinds of models were trained and evaluated on the same data set. Experimental results show that the performance of the two kinds of models are encouraging with the CRF model being able to recognize more out-of-vocabulary (OOV) words than the best HMM model by a margin of 3.05 %. While the overall accuracy of the best HMM-based model is 83.62 %, that of CRF is 84.66 %. Although CRF model gives marginal superior performance, both HMM and CRF modeling approaches are clearly promising, given their OOV words recognition rates.

[1]  Naushad UzZaman,et al.  Comparison of different POS Tagging Techniques (n-gram, HMM and Brill’s tagger) for Bangla , 2007 .

[2]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[3]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[4]  S. L. Pandharipande,et al.  Modeling of Osmotic Dehydration Kinetics of Banana Slices using Artificial Neural Network , 2012 .

[5]  Fethi Jarray,et al.  Genetic approach for arabic part of speech tagging , 2013, ArXiv.

[6]  Ikechukwu E. Onyenwe,et al.  Predicting Morphologically-Complex Unknown Words in Igbo , 2016, TSD.

[7]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[8]  Sudeshna Sarkar,et al.  Part of Speech Tagging for Bengali with Hidden Markov Model , 2006 .

[9]  Geoffrey Leech,et al.  EAGLES recommendations for the morphosyntactic annotation of corpora , 1996 .

[10]  Suresh Manandhar,et al.  Unsupervised Learning of Morphology by using Syntactic Categories , 2009, CLEF.

[11]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[12]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[13]  Helmut Schmid,et al.  Part-of-Speech Tagging With Neural Networks , 1994, COLING.

[14]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[15]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[16]  Alexander Clark,et al.  Combining Distributional and Morphological Information for Part of Speech Induction , 2003, EACL.

[17]  E. S. Wheeler Mitkov, ed.: The Oxford handbook of computational linguistics , 2004 .

[18]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[19]  Beatrice Santorini Part-of-speech tagging guidelines for the penn treebank project , 1990 .

[20]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[21]  R ShambhaviB,et al.  Kannada Part-Of-Speech Tagging with Probabilistic Classifiers , 2012 .

[22]  Peter Waiganjo Wagacha,et al.  Unsupervised induction of Dholuo word classes using maximum entropy learning , 2013 .

[23]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[24]  Mark Steedman,et al.  A Bayesian Mixture Model for PoS Induction Using Multiple Features , 2011, EMNLP.

[25]  Daniel Jurafsky,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2009, Prentice Hall series in artificial intelligence.

[26]  Martin Frodl Part-of-Speech Tagging Using Neural Networks , 2014 .

[27]  W. Nelson Francis,et al.  FREQUENCY ANALYSIS OF ENGLISH USAGE: LEXICON AND GRAMMAR , 1983 .

[28]  Alexander Clark,et al.  Inducing Syntactic Categories by Context Distribution Clustering , 2000, CoNLL/LLL.

[29]  D. Adelson,et al.  Improved Part-of-Speech Prediction in Suffix Analysis , 2013, PloS one.

[30]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[31]  Christopher D. Manning Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[32]  Sudeshna Sarkar,et al.  A Hybrid Model for Part-of-Speech Tagging and its Application to Bengali , 2004, International Conference on Computational Intelligence.

[33]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[34]  Christer Samuelsson,et al.  Handling Sparse Data by Successive Abstraction , 1996, COLING.

[35]  Mohd Zakree Ahmad Nazri,et al.  Automatic Part of Speech Tagging for Arabic: An Experiment Using Bigram Hidden Markov Model , 2010, RSKT.

[36]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[37]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[38]  Beatrice Santorini,et al.  Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision) , 1990 .