The study of a nonstationary maximum entropy Markov model and its application on the pos-tagging task

Sequence labeling is a core task in natural language processing. The maximum entropy Markov model (MEMM) is a powerful tool in performing this task. This article enhances the traditional MEMM by exploiting the positional information of language elements. The stationary hypothesis is relaxed in MEMM, and the nonstationary MEMM (NS-MEMM) is proposed. Several related issues are discussed in detail, including the representation of positional information, NS-MEMM implementation, smoothing techniques, and the space complexity issue. Furthermore, the asymmetric NS-MEMM presents a more flexible way to exploit positional information. In the experiments, NS-MEMM is evaluated on both the Chinese and the English pos-tagging tasks. According to the experimental results, NS-MEMM yields effective improvements over MEMM by exploiting positional information. The smoothing techniques in this article effectively solve the NS-MEMM data-sparseness problem; the asymmetric NS-MEMM is also an improvement by exploiting positional information in a more flexible way.

[1]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[2]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[3]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[5]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[6]  Jin H. Kim,et al.  Nonstationary hidden Markov model , 1995, Signal Process..

[7]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[8]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Hermann Ney,et al.  Smoothing methods in maximum entropy language modeling , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10]  Richard M. Schwartz,et al.  An Omnifont Open-Vocabulary OCR System for English and Arabic , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Ronald Rosenfeld,et al.  A survey of smoothing techniques for ME models , 2000, IEEE Trans. Speech Audio Process..

[12]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[13]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[14]  Li Deng,et al.  Nonstationary-state hidden Markov model representation of speech signals for speech enhancement , 2002, Signal Process..

[15]  Petar M. Djuric,et al.  An MCMC sampling approach to estimation of nonstationary hidden Markov models , 2002, IEEE Trans. Signal Process..

[16]  I. J. Myung,et al.  Tutorial on maximum likelihood estimation , 2003 .

[17]  Hwee Tou Ng,et al.  Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based? , 2004, EMNLP.

[18]  Xiao Xi Duration Distribution Based HMM Speech Recognition Models , 2004 .

[19]  Xiao Jinghui,et al.  Principles of non-stationary hidden markov model and its applications to sequence labeling task , 2005 .

[20]  Wei Yuan,et al.  Minimum Sample Risk Methods for Language Modeling , 2005, HLT/EMNLP.

[21]  Yujian Li Hidden Markov models with states depending on observations , 2005, Pattern Recognit. Lett..

[22]  Brian Roark,et al.  Discriminative Syntactic Language Modeling for Speech Recognition , 2005, ACL.

[23]  Xiaolong Wang,et al.  Principles of Non-stationary Hidden Markov Model and Its Applications to Sequence Labeling Task , 2005, IJCNLP.

[24]  Xiaolong Wang,et al.  Chinese Chunking Based on Maximum Entropy Markov Models , 2006, Int. J. Comput. Linguistics Chin. Lang. Process..

[25]  Xiaolong Wang,et al.  A seqlet-based maximum entropy Markov approach for protein secondary structure prediction , 2005, Science in China Series C: Life Sciences.