Discriminative feature weighting for HMM-based continuous speech recognizers

The Discriminative Feature Extraction (DFE) method provides an appropriate formalism for the design of the front-end feature extraction module in pattern classification systems. In the recent years, this formalism has been successfully applied to different speech recognition problems, like classification of vowels, classification of phonemes or isolated word recognition. The DFE formalism can be applied to weight the contribution of the components in the feature vector. This variant of DFE, that we call Discriminative Feature Weighting (DFW), improves the pattern classification systems by enhancing those components more relevant for the discrimination among the different classes. This paper is dedicated to the application of the DFW formalism to Continuous Speech Recognizers (CSR) based on Hidden Markov Models (HMMs). Two different types of HMM-based speech recognizers are considered: recognizers based on Discrete-HMMs (DHMMs) (for which the acoustic evaluation is based on an Euclidean distance measure) and Semi-Continuous-HMMs (SCHMMs) (for which the acoustic evaluation is performed making use of a mixture of multivariated Gaussians). We report how the components can be weighted and how the weights can be discriminatively trained and applied to the speech recognizers. We present recognition results for several continuous speech recognition tasks. The experimental results show the utility of DFW for HMM-based continuous speech recognizers.

[1]  Alain Biem,et al.  Filter bank design based on discriminative feature extraction , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Xuedong Huang,et al.  Unified techniques for vector quantization and hidden Markov modeling using semi-continuous models , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[4]  Shigeru Katagiri,et al.  Prototype-based minimum classification error/generalized probabilistic descent training for various speech units , 1994, Comput. Speech Lang..

[5]  Qin Jin,et al.  Application of LDA to speaker recognition , 2000, INTERSPEECH.

[6]  Chin-Hui Lee,et al.  Acoustic modeling for large vocabulary speech recognition , 1990 .

[7]  Alain Biem,et al.  Feature extraction based on minimum classification error/generalized probabilistic descent method , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Kuldip K. Paliwal,et al.  Minimum classification error training algorithm for feature extractor and pattern classifier in speech recognition , 1995, EUROSPEECH.

[9]  Kai-Fu Lee,et al.  Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990 .

[10]  H. Wakita,et al.  A comparative study of cepstral lifters and distance measures for all pole models of speech in noise , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[11]  Antonio M. Peinado,et al.  Multiple VQ hidden Markov modelling for speech recognition , 1994, Speech Commun..

[12]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[13]  Antonio M. Peinado,et al.  Discriminative feature extraction for speech recognition in noise , 1997, EUROSPEECH.

[14]  Alain Biem,et al.  A discriminative filter bank model for speech recognition , 1995, EUROSPEECH.

[15]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[16]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[17]  Steve Young,et al.  A review of large-vocabulary continuous-speech recognition , 1996 .

[18]  Shigeru Katagiri,et al.  Discriminative metric design for robust pattern recognition , 1997, IEEE Trans. Signal Process..

[19]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[20]  Xuedong Huang,et al.  On semi-continuous hidden Markov modeling , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[21]  Ben P. Milner Cepstral-time matrices and LDA for improved connected digit and sub-word recognition accuracy , 1997, EUROSPEECH.

[22]  Michiel Bacchiani,et al.  Optimization of time-frequency masking filters using the minimum classification error criterion , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Antonio M. Peinado,et al.  An application of minimum classification error to feature space transformations for speech recognition , 1996, Speech Commun..

[24]  Yoh'ichi Tohkura,et al.  A weighted cepstral distance measure for speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[25]  Chin-Hui Lee,et al.  Segmental GPD training of HMM based speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  Steve Young,et al.  The HTK book , 1995 .

[27]  José L. Pérez-Córdoba,et al.  Discriminative codebook design using multiple vector quantization in HMM-based speech recognizers , 1996, IEEE Trans. Speech Audio Process..

[28]  B. Juang,et al.  Context-dependent Phonetic Hidden Markov Models for Speaker-independent Continuous Speech Recognition , 2008 .

[29]  Yves Normandin Maximum Mutual Information Estimation of Hidden Markov Models , 1996 .

[30]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[31]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[32]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[33]  José L. Pérez-Córdoba,et al.  STACC: an automatic service for information access using continuous speech recognition through telephone line , 1997, EUROSPEECH.

[34]  Alain Biem,et al.  Pattern recognition using discriminative feature extraction , 1997, IEEE Trans. Signal Process..

[35]  Biing-Hwang Juang,et al.  Discriminative feature extraction for speech recognition , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[36]  Shigeru Katagiri,et al.  Discriminative metric design for pattern recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[37]  Steve Young,et al.  A review of large-vocabulary continuous-speech , 1996, IEEE Signal Process. Mag..

[38]  Pedro J. Moreno,et al.  A new algorithm for robust speech recognition: the delta vector taylor series approach , 1997, EUROSPEECH.

[39]  Antonio M. Peinado,et al.  Minimum classification error transformations for improving speech recognition systems , 1996, 1996 8th European Signal Processing Conference (EUSIPCO 1996).

[40]  Antonio Rubio,et al.  ALBAYZIN: a task-oriented spanish speech corpus , 1998 .

[41]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[42]  Simon King,et al.  Proc. Eurospeech'97 , 1997 .

[43]  Biing-Hwang Juang,et al.  On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[44]  Antonio M. Peinado,et al.  MCE estimation of VQ parameters for MVQHMM speech recognition , 1995, EUROSPEECH.

[45]  Alain Biem,et al.  Cepstrum-based filter-bank design using discriminative feature extraction training at various levels , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.