Hidden Markov Chains, Entropic Forward-Backward, and Part-Of-Speech Tagging

The ability to take into account the characteristics - also called features - of observations is essential in Natural Language Processing (NLP) problems. Hidden Markov Chain (HMC) model associated with classic Forward-Backward probabilities cannot handle arbitrary features like prefixes or suffixes of any size, except with an independence condition. For twenty years, this default has encouraged the development of other sequential models, starting with the Maximum Entropy Markov Model (MEMM), which elegantly integrates arbitrary features. More generally, it led to neglect HMC for NLP. In this paper, we show that the problem is not due to HMC itself, but to the way its restoration algorithms are computed. We present a new way of computing HMC based restorations using original Entropic Forward and Entropic Backward (EFB) probabilities. Our method allows taking into account features in the HMC framework in the same way as in the MEMM framework. We illustrate the efficiency of HMC using EFB in Part-Of-Speech Tagging, showing its superiority over MEMM based restoration. We also specify, as a perspective, how HMCs with EFB might appear as an alternative to Recurrent Neural Networks to treat sequential data with a deep architecture.

[1]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[2]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[3]  Feng Jin,et al.  Unsupervised phase detection for respiratory sounds using improved scale-space features , 2016, 2016 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).

[4]  Emmanuel Monfrini,et al.  Unsupervised Segmentation of Random Discrete Data Hidden With Switching Noise Distributions , 2012, IEEE Signal Processing Letters.

[5]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[6]  Wojciech Pieczynski,et al.  Triplet Markov chains in hidden signal restoration , 2003, SPIE Remote Sensing.

[7]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[8]  Vincent Barra,et al.  A non-stationary NDVI time series modelling using triplet Markov chain , 2019 .

[9]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[10]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[11]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[12]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[13]  Emmanuel Monfrini,et al.  Assessing the segmentation performance of pairwise and triplet Markov models , 2018, Signal Process..

[14]  Christophe Collet,et al.  Non-stationary fuzzy Markov chain , 2007, Pattern Recognit. Lett..

[15]  Shou Chen,et al.  Modeling Repayment Behavior of Consumer Loan in Portfolio across Business Cycle: A Triplet Markov Model Approach , 2020, Complex..

[16]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[17]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[18]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[19]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[20]  Zachary Chase Lipton A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[21]  Najlae Idrissi,et al.  Triplet Markov chain in images segmentation , 2018, 2018 International Conference on Intelligent Systems and Computer Vision (ISCV).

[22]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[23]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[24]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[25]  Christophe Collet,et al.  Triplet Markov chain for 3D MRI brain segmentation using a probabilistic atlas , 2006, 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, 2006..

[26]  Neri Merhav,et al.  Hidden Markov processes , 2002, IEEE Trans. Inf. Theory.

[27]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[28]  Eric Moulines,et al.  Inference in hidden Markov models , 2010, Springer series in statistics.

[29]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[30]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[31]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[32]  Wojciech Pieczynski,et al.  An adaptive and on-line IMU-based locomotion activity classification method using a triplet Markov model , 2019, Neurocomputing.

[33]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[34]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[35]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[36]  Zhiyuan Liu,et al.  A C-LSTM Neural Network for Text Classification , 2015, ArXiv.

[37]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .