Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent method

This paper provides a comprehensive introduction to a novel approach to pattern recognition which is based on the generalized probabilistic descent method (GPD) and its related design algorithms. The paper contains a survey of recent recognizer design techniques, the formulation of GPD, the concept of minimum classification error learning that is closely related to the GPD formalization, a relational analysis between GPD and other important design methods, and various embodiments of GPD-based design, including segmental-GPD, minimum spotting error training, discriminative utterance verification, and discriminative feature extraction. GPD development has its origins in basic pattern recognition and Bayes decision theory. It represents a simple but careful re-investigation of the classical theory and successfully leads to an innovative framework. For clarity of presentation, detailed discussions about its embodiments are provided for examples of speech pattern recognition tasks that use a distance-based classifier. Experimental results in speech pattern recognition tasks clearly demonstrate the remarkable utility of the family of GPD-based design algorithms.

[1]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[2]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[3]  L. Baum,et al.  Growth transformations for functions on manifolds. , 1968 .

[4]  H. Sorenson,et al.  Recursive bayesian estimation using gaussian sums , 1971 .

[5]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[6]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Erkki Oja,et al.  Subspace methods of pattern recognition , 1983 .

[9]  Lev Goldfarb,et al.  A unified approach to pattern recognition , 1984, Pattern Recognit..

[10]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Frederick Jelinek,et al.  Self-organizing language modeling for speech recognition , 1990 .

[12]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Lalit R. Bahl,et al.  A new algorithm for the estimation of hidden Markov model parameters , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[14]  Shun-ichi Amari,et al.  Characteristics of sparsely encoded associative memory , 1989, Neural Networks.

[15]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[16]  Alexander H. Waibel,et al.  A novel objective function for improved phoneme recognition using time delay neural networks , 1990, International 1989 Joint Conference on Neural Networks.

[17]  Ken-ichi Iso,et al.  Speaker-independent word recognition using dynamic programming neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[18]  N. Otsu,et al.  Nonlinear data analysis and multilayer perceptrons , 1989, International 1989 Joint Conference on Neural Networks.

[19]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[20]  Yoh-Han Pao,et al.  Adaptive pattern recognition and neural networks , 1989 .

[21]  Harvey F. Silverman,et al.  Combining hidden Markov model and neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[22]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[23]  Shigeru Katagiri,et al.  A generalized probabilistic descent method , 1990 .

[24]  John Makhoul,et al.  Discriminant analysis and supervised vector quantization for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[25]  Kiyohiro Shikano,et al.  Integrated training for spotting Japanese phonemes using large phonemic time-delay neural networks , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[26]  James D. Keeler,et al.  Layered Neural Networks with Gaussian Hidden Units as Universal Approximations , 1990, Neural Computation.

[27]  H. Gish,et al.  A probabilistic approach to the understanding and training of neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[28]  E. Mcdermott,et al.  LVQ3 for phoneme recognition , 1990 .

[29]  Yuqing Gao,et al.  HMM-based warping in neural networks , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[30]  John S. Bridle,et al.  Alpha-nets: A recurrent 'neural' network architecture with a hidden Markov model interpretation , 1990, Speech Commun..

[31]  E. McDermott,et al.  A hybrid speech recognition system using HMMs with an LVQ-trained codebook , 1990 .

[32]  M. A. Bush,et al.  Speaker-independent vowel classification using hidden Markov models and LVQ2 , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[33]  Nils J. Nilsson,et al.  The Mathematical Foundations of Learning Machines , 1990 .

[34]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Biing-Hwang Juang,et al.  The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[36]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[37]  John J. Hopfield,et al.  Connected-digit speaker-dependent speech recognition using a neural network with time-delayed connections , 1991, IEEE Trans. Signal Process..

[38]  Alex Waibel,et al.  Integrating time alignment and neural networks for high performance continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[39]  Biing-Hwang Juang,et al.  Discriminative multi-layer feed-forward networks , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[40]  Biing-Hwang Juang,et al.  New discriminative training algorithms based on the generalized probabilistic descent method , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[41]  Shigeru Katagiri,et al.  LVQ-based shift-tolerant phoneme recognition , 1991, IEEE Trans. Signal Process..

[42]  A. Ando,et al.  A clustering algorithm to minimize recognition error functions , 1991 .

[43]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[44]  Kunio Nakajima,et al.  A discriminative training method for continuous mixture density HMMs and its implementation to recognize noisy speech , 1992 .

[45]  Chin-Hui Lee,et al.  Segmental GPD training of HMM based speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[46]  M. Sugiyama,et al.  Minimal classification error optimization for a speaker mapping neural network , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[47]  Biing-Hwang Juang,et al.  Discriminative template training for dynamic programming speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[48]  Shigeru Katagiri,et al.  GPD training of dynamic programming-based speech recognizers , 1992 .

[49]  Mitsuru Endo,et al.  Recognition of phonemes in continuous speech using a modified LVQ2 method , 1992 .

[50]  Keh-Yih Su,et al.  A unified framework to incorporate speech and language information in spoken language processing , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[51]  Shigeki Sagayama,et al.  Minimum error classification training of HMMs , 1992 .

[52]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[53]  Yoshua Bengio,et al.  Global optimization of a neural network-hidden Markov model hybrid , 1992, IEEE Trans. Neural Networks.

[54]  E. McDermott,et al.  Re-evaluation of LVQ-HMM hybrid algorithm , 1993 .

[55]  Alain Biem,et al.  Feature extraction based on minimum classification error/generalized probabilistic descent method , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[56]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[57]  Biing-Hwang Juang,et al.  Minimum error rate training based on N-best string models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[58]  Biing-Hwang Juang,et al.  Discriminative feature extraction for speech recognition , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[59]  Biing-Hwang Juang,et al.  Discriminative training of dynamic programming based speech recognizers , 1993, IEEE Trans. Speech Audio Process..

[60]  Shigeru Katagiri,et al.  A new hybrid algorithm for speech recognition based on HMM segmentation and learning vector quantization , 1993, IEEE Trans. Speech Audio Process..

[61]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[62]  Mei-Yuh Hwang,et al.  Unified stochastic engine (USE) for speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[63]  Shigeru Katagiri,et al.  Prototype-based MCE/GPD training for word spotting and connected word recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[64]  Chin-Hui Lee,et al.  Speech recognition using weighted HMM and subspace projection approaches , 1994, IEEE Trans. Speech Audio Process..

[65]  Patrick Haffner,et al.  A new probabilistic framework for connectionist time alignment , 1994, ICSLP.

[66]  Richard J. Mammone,et al.  Speaker recognition using neural networks and conventional classifiers , 1994, IEEE Trans. Speech Audio Process..

[67]  Frank K. Soong,et al.  An N-best candidates-based discriminative training for speech recognition applications , 1994, IEEE Trans. Speech Audio Process..

[68]  Renato De Mori,et al.  High-performance connected digit recognition using maximum mutual information estimation , 1994, IEEE Trans. Speech Audio Process..

[69]  Alain Biem,et al.  Filter bank design based on discriminative feature extraction , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[70]  Biing-Hwang Juang,et al.  Minimum error rate training of inter-word context dependent acoustic model units in speech recognition , 1994, ICSLP.

[71]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[72]  Shigeru Katagiri,et al.  Prototype-based minimum classification error/generalized probabilistic descent training for various speech units , 1994, Comput. Speech Lang..

[73]  B. Juang,et al.  A study on minimum error discriminative training for speaker recognition , 1995 .

[74]  Stephan Euler,et al.  Integrated optimization of feature transformation for speech recognition , 1995, EUROSPEECH.

[75]  S. Katagiri,et al.  A novel approach to pattern recognition based on discriminative metric design , 1995, Proceedings of 1995 IEEE Workshop on Neural Networks for Signal Processing.

[76]  Wu Chou,et al.  Signal conditioned minimum error rate training , 1995, EUROSPEECH.

[77]  S. Katagiri,et al.  Discriminative Subspace Method for Minimum Error Pattern Recognition , 1995, Proceedings of 1995 IEEE Workshop on Neural Networks for Signal Processing.

[78]  Enric Monte-Moreno,et al.  Optimization of speech parameter weighting for CDHMM word recognition , 1995, EUROSPEECH.

[79]  Sadaoki Furui,et al.  A study of speaker adaptation based on minimum classification error training , 1995, EUROSPEECH.

[80]  Shigeru Katagiri,et al.  A Minimum Error Approach to Spotting-Based Pattern Recognition , 1995, IEICE Trans. Inf. Syst..

[81]  Shigeru Katagiri,et al.  A novel spotting-based approach to continuous speech recognition: Minimum error classification of keyword-sequences , 1995 .

[82]  Thomas Jacobs,et al.  Results of a speaker verification service trial using HMM models , 1995, EUROSPEECH.

[83]  Biing-Hwang Juang,et al.  A training procedure for verifying string hypotheses in continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[84]  Hervé Bourlard,et al.  Neural networks for statistical recognition of continuous speech , 1995, Proc. IEEE.

[85]  Kuldip K. Paliwal,et al.  Minimum classification error training algorithm for feature extractor and pattern classifier in speech recognition , 1995, EUROSPEECH.

[86]  Steve Young,et al.  A review of large-vocabulary continuous-speech recognition , 1996 .

[87]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition , 1996 .

[88]  Richard Rose,et al.  Word Spotting from Continuous Speech Utterances , 1996 .

[89]  Steve Young,et al.  A review of large-vocabulary continuous-speech , 1996, IEEE Signal Process. Mag..

[90]  S. Young,et al.  Lattice-based discriminative training for large vocabulary speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[91]  Chin-Hui Lee,et al.  Simultaneous ANN feature and HMM recognizer design using string-based minimum classification error (MCE) training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[92]  Shigeru Katagiri,et al.  Subspace Method for Minimum Error Pattern Recognition , 1997 .

[93]  Li Deng,et al.  Use of generalized dynamic feature parameters for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[94]  Alain Biem,et al.  Pattern recognition using discriminative feature extraction , 1997, IEEE Trans. Signal Process..

[95]  Shigeru Katagiri,et al.  String-level MCE for continuous phoneme recognition , 1997, EUROSPEECH.

[96]  Li Deng,et al.  HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features , 1997, IEEE Trans. Speech Audio Process..

[97]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[98]  Shigeru Katagiri,et al.  Discriminative metric design for robust pattern recognition , 1997, IEEE Trans. Signal Process..

[99]  Yuji Matsumoto,et al.  Minimum detection error training for acoustic signal monitoring , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).