On Adjusted Viterbi Training

Abstract The EM algorithm is a principal tool for parameter estimation in the hidden Markov models, where its efficient implementation is known as the Baum–Welch algorithm. This paper is however motivated by applications where EM is replaced by Viterbi training, or extraction (VT), also known as the Baum–Viterbi algorithm. VT is computationally less intensive and more stable, and has more of an intuitive appeal. However, VT estimators are also biased and inconsistent. Recently, we have proposed elsewhere the adjusted Viterbi training (VA), a new method to alleviate the above imprecision of the VT estimators while preserving the computational advantages of the baseline VT algorithm. The key difference between VA and VT is that asymptotically, the true parameter values are a fixed point of VA (and EM), but not of VT. We have previously studied VA for a special case of Gaussian mixtures, including simulations to illustrate its improved performance. The present work proves the asymptotic fixed point property of VA for general hidden Markov models.

[1]  Eric Moulines,et al.  Inference in Hidden Markov Models (Springer Series in Statistics) , 2005 .

[2]  Erik McDermott,et al.  Minimum classification error training of landmark models for real-time continuous speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Lawrence R. Rabiner,et al.  A segmental k-means training procedure for connected word recognition , 1986, AT&T Technical Journal.

[4]  Biing-Hwang Juang,et al.  The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[5]  Eric Moulines,et al.  Inference in hidden Markov models , 2010, Springer series in statistics.

[6]  M. Degroot,et al.  Probability and Statistics , 2021, Examining an Operational Approach to Teaching Probability.

[7]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[8]  Timothy J. Hazen,et al.  ACOUSTIC MODELING IMPROVEMENTS IN A SEGMENT-BASED SPEECH RECOGNIZER , 1999 .

[9]  S. Levinson,et al.  Image Models (and their Speech Model Cousins) , 1996 .

[10]  D. M. Titterington,et al.  Comments on "Application of the Conditional Population-Mixture Model to Image Segmentation" , 1984, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Robert M. Gray,et al.  Global convergence and empirical consistency of the generalized Lloyd algorithm , 1986, IEEE Trans. Inf. Theory.

[12]  Hermann Ney,et al.  An Overview of the Philips Research System for Large Vocabulary Continuous Speech Recognition , 1994, Int. J. Pattern Recognit. Artif. Intell..

[13]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[14]  M. Borodovsky,et al.  Gene identification in novel eukaryotic genomes by self-training algorithm , 2005, Nucleic acids research.

[15]  Willard Miller,et al.  The IMA volumes in mathematics and its applications , 1986 .

[16]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[17]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[18]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[19]  T. Rydén Consistent and Asymptotically Normal Parameter Estimates for Hidden Markov Models , 1994 .

[20]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[21]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[22]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[23]  Heinrich Niemann,et al.  Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition , 2001, ISMB.

[24]  Amke Caliebe,et al.  Properties of the maximum a posteriori path estimator in hidden Markov models , 2006, IEEE Transactions on Information Theory.

[25]  Tamás Linder,et al.  A Lagrangian formulation of Zador's entropy-constrained quantization theorem , 2002, IEEE Trans. Inf. Theory.

[26]  Neri Merhav,et al.  Hidden Markov processes , 2002, IEEE Trans. Inf. Theory.

[27]  Philip A. Chou,et al.  Entropy-constrained vector quantization , 1989, IEEE Trans. Acoust. Speech Signal Process..

[28]  Peter Bryant,et al.  Asymptotic behaviour of classification maximum likelihood estimates , 1978 .

[29]  J. Lember,et al.  Adjusted Viterbi training for hidden Markov models , 2007, 0709.2317.

[30]  Robert M. Gray,et al.  Multiresolution image classification by hierarchical modeling with two-dimensional hidden Markov models , 2000, IEEE Trans. Inf. Theory.

[31]  P. Bucher,et al.  DNA Binding Specificity of Different STAT Proteins , 2001, The Journal of Biological Chemistry.

[32]  Anders Krogh,et al.  Chapter 4 - An introduction to hidden Markov models for biological sequences , 1998 .

[33]  Hermann Ney,et al.  The Philips Research system for continuous-speech recognition , 1992 .

[34]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[35]  Thierry Jeantheau,et al.  Stochastic volatility models as hidden Markov models and statistical applications , 2000 .

[36]  Simon Kasif,et al.  Computational methods in molecular biology , 1998 .

[37]  N. Merhav,et al.  Hidden Markov modeling using a dominant state sequence with application to speech recognition , 1991 .

[38]  J. A. Kogan,et al.  Hidden Markov models estimation via the most informative stopping times for Viterbi algorithm , 1995, Proceedings of 1995 IEEE International Symposium on Information Theory.

[39]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[40]  B. Leroux Maximum-likelihood estimation for hidden Markov models , 1992 .

[41]  Uwe Rösler,et al.  Convergence of the maximum a posteriori path estimator in hidden Markov models , 2002, IEEE Trans. Inf. Theory.

[42]  James R. Glass,et al.  Baum-Welch training for segment-based speech recognition , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[43]  James Ze Wang,et al.  A computationally efficient approach to the estimation of two- and three-dimensional hidden Markov models , 2006, IEEE Transactions on Image Processing.

[44]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[45]  Stanley L. Sclove,et al.  Application of the Conditional Population-Mixture Model to Image Segmentation , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  J. Lember,et al.  ADJUSTED VITERBI TRAINING , 2004, Probability in the Engineering and Informational Sciences.

[47]  P. Billingsley,et al.  Probability and Measure , 1980 .

[48]  Jüri Lember,et al.  Adjusted Viterbi Training:a proof of concept , 2005 .

[49]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[50]  G. Lindgren Markov regime models for mixed distributions and switching regressions , 1978 .

[51]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .