Penalized Logistic Regression With HMM Log-Likelihood Regressors for Speech Recognition

Hidden Markov models (HMMs) are powerful generative models for sequential data that have been used in automatic speech recognition for more than two decades. Despite their popularity, HMMs make inaccurate assumptions about speech signals, thereby limiting the achievable performance of the conventional speech recognizer. Penalized logistic regression (PLR) is a well-founded discriminative classifier with long roots in the history of statistics. Its classification performance is often compared with that of the popular support vector machine (SVM). However, for speech classification, only limited success with PLR has been reported, partially due to the difficulty with sequential data. In this paper, we present an elegant way of incorporating HMMs in the PLR framework. This leads to a powerful discriminative classifier that naturally handles sequential data. In this approach, speech classification is done using affine combinations of HMM log-likelihoods. We believe that such combinations of HMMs lead to a more accurate classifier than the conventional HMM-based classifier. Unlike similar approaches, we jointly estimate the HMM parameters and the PLR parameters using a single training criterion. The extension to continuous speech recognition is done via rescoring of N-best lists or lattices.

[1]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[3]  Pedro J. Moreno,et al.  On the use of support vector machines for phonetic classification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4]  Ching Y. Suen,et al.  A generative-discriminative hybrid for sequential data classification [image classification example] , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Pavel Matejka,et al.  Hierarchical Structures of Neural Networks for Phoneme Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  田邉 国士,et al.  Penalized Logistic Regression Machines and Related Linear Numerical Algebra (The Numerical Solution of Differential Equations and Linear Computation) , 2003 .

[7]  Tomoko Matsui,et al.  N-Best Rescoring for Speech Recognition using Penalized Logistic Regression Machines with Garbage Class , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[10]  Mark J. F. Gales,et al.  Speech Recognition using SVMs , 2001, NIPS.

[11]  S. Katagiri,et al.  Discriminative Learning for Minimum Error Classification , 2009 .

[12]  A. Albert,et al.  On the existence of maximum likelihood estimates in logistic regression models , 1984 .

[13]  Georg Heigold,et al.  On the equivalence of Gaussian HMM and Gaussian HMM-like hidden conditional random fields , 2007, INTERSPEECH.

[14]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Alex Acero,et al.  Training Algorithms for Hidden Conditional Random Fields , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[16]  Michael Picheny,et al.  On a model-robust training method for speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[17]  Stephen A. Zahorian,et al.  Phone classification with segmental features and a binary-pair partitioned neural network classifier , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[19]  Alex Acero,et al.  Hidden conditional random fields for phone classification , 2005, INTERSPEECH.

[20]  David Haussler,et al.  Probabilistic kernel regression models , 1999, AISTATS.

[21]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[22]  A. Wendemuth,et al.  Sparse Kernel Logistic Regression for Phoneme Classification , 2022 .

[23]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[24]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[25]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[26]  Mark J. F. Gales,et al.  Augmented Statistical Models for Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[27]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[28]  Johan A. K. Suykens,et al.  Fixed-size kernel logistic regression for phoneme classification , 2007, INTERSPEECH.

[29]  Tomoko Matsui,et al.  Isolated-Word Recognition with Penalized Logistic Regression Machines , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[30]  James R. Glass,et al.  Heterogeneous acoustic measurements for phonetic classification 1 , 1997, EUROSPEECH.

[31]  A. Nadas,et al.  A decision theorectic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood , 1983 .

[32]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[33]  Øystein Birkenes,et al.  A Framework for Speech Recognition using Logistic Regression , 2007 .

[34]  James R. Glass,et al.  HETEROGENEOUS ACOUSTIC MEASUREMENTS FOR PHONETIC CLASSIFICATION , 1997 .

[35]  K. Tanabe,et al.  Conjugate-gradient method for computing the Moore-Penrose inverse and rank of a matrix , 1977 .

[36]  Torbjørn Svendsen,et al.  JOINT OPTIMIZATION OF EVENT DETECTORS AND EVIDENCE MERGER FOR CONTINUOUS PHONE RECOGNITION , 2008 .

[37]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[38]  Chin-Hui Lee,et al.  A penalized logistic regression approach to detection based phone classification , 2008, INTERSPEECH.

[39]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[40]  Jinyu Li,et al.  Approximate Test Risk Minimization Through Soft Margin Estimation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[41]  Lawrence K. Saul,et al.  Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.