Discriminant-function-based minimum recognition error rate pattern-recognition approach to speech recognition

A discriminant function-based minimum recognition error rate pattern recognition approach is described and studied for various applications in speech processing. This approach departs from the conventional paradigm, which links a classification/recognition task to the problem of distribution estimation. Instead, it takes a discriminant function based statistical pattern recognition approach. The suitability of this approach for classification error rate minimization is established through a special loss function. It is meaningful even when the model correctness assumption is known to be not valid. We study the theoretical basis of this approach and compare it with various criteria used in speech recognition. We differentiate the method of classifier design by way of distribution estimation and the discriminant function methods of minimizing classification error rate, based on the fact that in many realistic applications, such as speech recognition, the true distribution form of the source is rarely known precisely, and without model correctness assumption, the classical optimality theory of the distribution estimation approach cannot be applied directly. We discuss issues in this new classifier design paradigm and present various extensions of this approach to classifier design applications in speech processing.

[1]  J. Doob Stochastic processes , 1953 .

[2]  J. Blum Multidimensional Stochastic Approximation Methods , 1954 .

[3]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[4]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[5]  Dennis J. Clague,et al.  New Classes of Synchronous Codes , 1967, IEEE Trans. Electron. Comput..

[6]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[7]  Ya Tsypkin,et al.  Self-learning--What is it? , 1968 .

[8]  L. Baum,et al.  Growth transformations for functions on manifolds. , 1968 .

[9]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[10]  Shun-ichi Amari,et al.  Learning Patterns and Pattern Sequences by Self-Organizing Nets of Threshold Elements , 1972, IEEE Transactions on Computers.

[11]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[12]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[13]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Hai Do-Tu,et al.  Learning Algorithms for Nonparametric Solution to the Minimum Error Classification Problem , 1978, IEEE Transactions on Computers.

[16]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  D. Pollard Convergence of stochastic processes , 1984 .

[18]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[19]  Frederick Jelinek,et al.  The development of an experimental discrete dictation recognizer , 1985 .

[20]  Biing-Hwang Juang,et al.  Maximum likelihood estimation for multivariate mixture observations of markov chains , 1986, IEEE Trans. Inf. Theory.

[21]  Biing-Hwang Juang,et al.  On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[22]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[23]  Michael Picheny,et al.  On a model-robust training method for speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[24]  A. Nadas,et al.  Decoder selection based on cross-entropies , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[25]  Lawrence R. Rabiner,et al.  A minimum discrimination information approach for hidden Markov modeling , 1989, IEEE Trans. Inf. Theory.

[26]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[27]  Yariv Ephraim,et al.  Estimation of hidden Markov model parameters by minimizing empirical error rate , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[28]  Lawrence R. Rabiner,et al.  On the relations between modeling approaches for speech recognition , 1990, IEEE Trans. Inf. Theory.

[29]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[30]  Biing-Hwang Juang,et al.  The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[31]  Frank K. Soong,et al.  A Tree.Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition , 1990, HLT.

[32]  Biing-Hwang Juang,et al.  New discriminative training algorithms based on the generalized probabilistic descent method , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[33]  Aaron E. Rosenberg,et al.  Improved acoustic modeling for speaker independent large vocabulary continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[34]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[35]  Dimitri Kanevsky,et al.  An inequality for rational functions with applications to some statistical estimation problems , 1991, IEEE Trans. Inf. Theory.

[36]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[37]  Chin-Hui Lee,et al.  Segmental GPD training of HMM based speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38]  M. Sugiyama,et al.  Minimal classification error optimization for a speaker mapping neural network , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[39]  Aaron E. Rosenberg,et al.  Improved acoustic modeling for large vocabulary continuous speech recognition , 1992 .

[40]  Biing-Hwang Juang,et al.  Discriminative template training for dynamic programming speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41]  Shigeru Katagiri,et al.  Application of a generalized probabilistic descent method to dynamic time warping-based speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[42]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[43]  Biing-Hwang Juang,et al.  The use of cohort normalized scores for speaker verification , 1992, ICSLP.

[44]  Lalit R. Bahl,et al.  Estimating hidden Markov model parameters so as to maximize speech recognition accuracy , 1993, IEEE Trans. Speech Audio Process..

[45]  Biing-Hwang Juang,et al.  Minimum error rate training based on N-best string models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[46]  Qiang Huo,et al.  The gradient projection method for the training of hidden Markov models , 1993, Speech Commun..

[47]  Biing-Hwang Juang,et al.  Discriminative training of dynamic programming based speech recognizers , 1993, IEEE Trans. Speech Audio Process..

[48]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[49]  Mei-Yuh Hwang,et al.  Unified stochastic engine (USE) for speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[50]  Renato De Mori,et al.  High-performance connected digit recognition using maximum mutual information estimation , 1994, IEEE Trans. Speech Audio Process..

[51]  Biing-Hwang Juang,et al.  Minimum error rate training of inter-word context dependent acoustic model units in speech recognition , 1994, ICSLP.

[52]  Biing-Hwang Juang,et al.  A Minimum Error Rate Pattern Recognition Approach to Speech Recognition , 1994, Int. J. Pattern Recognit. Artif. Intell..

[53]  Shigeru Katagiri,et al.  Prototype-based minimum classification error/generalized probabilistic descent training for various speech units , 1994, Comput. Speech Lang..

[54]  B. Juang,et al.  A study on minimum error discriminative training for speaker recognition , 1995 .

[55]  Günther Ruske,et al.  Discriminative training for continuous speech recognition , 1995, EUROSPEECH.

[56]  Li Deng,et al.  Use of generalized dynamic feature parameters for speech recognition: maximum likelihood and minimum classification error approaches , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[57]  Biing-Hwang Juang,et al.  A training procedure for verifying string hypotheses in continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[58]  Kuldip K. Paliwal,et al.  Minimum classification error training algorithm for feature extractor and pattern classifier in speech recognition , 1995, EUROSPEECH.

[59]  Biing-Hwang Juang,et al.  A study on task-independent subword selection and modeling for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[60]  Biing-Hwang Juang,et al.  Discriminative utterance verification using minimum string verification error (MSVE) training , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[61]  Biing-Hwang Juang,et al.  Discriminative adaptation for speaker verification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[62]  Francisco Javier Caminero Gil,et al.  Discriminative training of GMM for speaker identification , 1996, ICASSP.

[63]  Rafid A. Sukkar,et al.  Correcting recognition errors via discriminative utterance verification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[64]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition , 1996 .

[65]  Chin-Hui Lee,et al.  Vocabulary independent discriminative utterance verification for nonkeyword rejection in subword based speech recognition , 1996, IEEE Trans. Speech Audio Process..

[66]  Eduardo Lleida,et al.  Efficient decoding and training procedures for utterance verification in continuous speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[67]  Chin-Hui Lee,et al.  Utterance verification of keyword strings using word-based minimum verification error (WB-MVE) training , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[68]  Li Deng,et al.  The trended HMM with discriminative training for phonetic classification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[69]  S. Young,et al.  Lattice-based discriminative training for large vocabulary speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[70]  Chin-Hui Lee,et al.  Speaker verification using normalized log-likelihood score , 1996, IEEE Trans. Speech Audio Process..

[71]  Chin-Hui Lee,et al.  Simultaneous ANN feature and HMM recognizer design using string-based minimum classification error (MCE) training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[72]  Biing-Hwang Juang,et al.  Key-phrase detection and verification for flexible speech understanding , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[73]  Hermann Ney,et al.  Comparison of optimization methods for discriminative training criteria , 1997, EUROSPEECH.

[74]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[75]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[76]  Alain Biem,et al.  Pattern recognition using discriminative feature extraction , 1997, IEEE Trans. Signal Process..

[77]  Biing-Hwang Juang,et al.  Discriminative utterance verification for connected digits recognition , 1995, IEEE Trans. Speech Audio Process..

[78]  Biing-Hwang Juang,et al.  Verbal information verification , 1997, EUROSPEECH.

[79]  Shigeru Katagiri,et al.  String-level MCE for continuous phoneme recognition , 1997, EUROSPEECH.

[80]  Chin-Hui Lee,et al.  String-based minimum verification error (SB-MVE) training for speech recognition , 1997, Comput. Speech Lang..

[81]  Li Deng,et al.  HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features , 1997, IEEE Trans. Speech Audio Process..

[82]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[83]  Chin-Hui Lee,et al.  Verifying and correcting recognition string hypotheses using discriminative utterance verification , 1997, Speech Commun..

[84]  Biing-Hwang Juang,et al.  Combining key-phrase detection and subword-based verification for flexible speech understanding , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[85]  Jean-Claude Junqua,et al.  Multilevel discriminative training for spelled word recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[86]  Biing-Hwang Juang,et al.  Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent method , 1998, Proc. IEEE.

[87]  Shawn M. Herman,et al.  Joint MCE estimation of VQ and HMM parameters for Gaussian mixture selection , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[88]  Peter Beyerlein,et al.  Discriminative model combination , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[89]  Wolfgang Macherey,et al.  Comparison of discriminative training criteria , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[90]  Biing-Hwang Juang,et al.  Speaker verification using verbal information verification for automatic enrolment , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[91]  Myoung-Wan Koo,et al.  A new decoder based on a generalized confidence score , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[92]  Rafid A. Sukkar,et al.  Subword-based minimum verification error (SB-MVE) training for task independent utterance verification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[93]  Yunxin Zhao,et al.  Robust speech recognition using discriminative stream weighting and parameter interpolation , 1998, ICSLP.

[94]  Malan B. Gandhi,et al.  Natural number recognition using MCE trained inter-word context dependent acoustic models , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[95]  Wolfgang Reichl Language model adaptation using minimum discrimination information , 1999, EUROSPEECH.

[96]  Kishore Papineni Discriminative training via linear programming , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[97]  Elmar Nöth,et al.  Discriminative estimation of interpolation parameters for language model classifiers , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[98]  Chin-Hui Lee,et al.  Minimum error rate training for PHMM-based text recognition , 1999, IEEE Trans. Image Process..

[99]  Saeed Vaseghi,et al.  Discriminative spectral-temporal multiresolution features for speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[100]  Myoung-Wan Koo,et al.  Speech recognition and utterance verification based on a generalized confidence score , 2001, IEEE Trans. Speech Audio Process..

[101]  Hermann Ney,et al.  Comparison of discriminative training criteria and optimization methods for speech recognition , 2001, Speech Commun..