Discriminative auditory-based features for robust speech recognition

Recently, a new auditory-based feature extraction algorithm for robust speech recognition in noisy environments was proposed. The new features are derived by mimicking closely the human peripheral auditory process and the filters in the outer ear, middle ear, and inner ear are obtained from psychoacoustics literature with some manual adjustments. In this paper, we extend the auditory-based feature extraction algorithm and propose to further train the auditory-based filters through discriminative training. Using the data-driven approach, we optimize the filters by minimizing the subsequent recognition errors on a task. One significant contribution over similar efforts in the past (generally under the name of "discriminative feature extraction") is that we make no assumption on the parametric form of the auditory-based filters. Instead, we only require the filters to be triangular-like: the filter weights have a maximum value in the middle and then monotonically decrease to both ends. Discriminative training of these constrained auditory-based filters leads to improved performance. Furthermore, we study the combined discriminative training procedure for both feature and acoustic model parameters. Our experiments show that the best performance can be obtained in a sequential procedure under the unified framework of MCE/GPD.

[1]  A M Simpson,et al.  Spectral enhancement to improve the intelligibility of speech in noise for hearing-impaired listeners. , 1990, Acta oto-laryngologica. Supplementum.

[2]  B. Moore An Introduction to the Psychology of Hearing , 1977 .

[3]  J. S. Bridle,et al.  An Alphanet approach to optimising input transformations for continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Antonio M. Peinado,et al.  Discriminative feature weighting for HMM-based continuous speech recognizers , 2002, Speech Commun..

[5]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[6]  L. Deng Ieee Transactions on Speech and Audio Processing, Speech Trajectory Discrimination Using the Minimum Classiication Error Learning , 1997 .

[7]  Frank K. Soong,et al.  A high-performance auditory feature for robust speech recognition , 2000, INTERSPEECH.

[8]  Yik-Cheung Tam,et al.  An alternative approach of finding competing hypotheses for better minimum classification error training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Li Deng,et al.  HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features , 1997, IEEE Trans. Speech Audio Process..

[10]  B C Moore,et al.  Perceptual consequences of cochlear hearing loss and their implications for the design of hearing aids. , 1996, Ear and hearing.

[11]  Biing-Hwang Juang,et al.  An application of discriminative feature extraction to filter-bank-based speech recognition , 2001, IEEE Trans. Speech Audio Process..

[12]  E. Owens Introduction to the Psychology of Hearing , 1977 .

[13]  Darren Pearce,et al.  Enabling new speech driven services for mobile devices: An overview of the ETSI standards activities , 2000 .

[14]  Biing-Hwang Juang,et al.  Minimum error rate training of inter-word context dependent acoustic model units in speech recognition , 1994, ICSLP.

[15]  Frank K. Soong,et al.  An auditory system-based feature for robust speech recognition , 2001, INTERSPEECH.

[16]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[17]  B. Moore An introduction to the psychology of hearing (5th ed.). , 1989 .

[18]  Chin-Hui Lee,et al.  Simultaneous ANN feature and HMM recognizer design using string-based minimum classification error (MCE) training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[19]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[20]  D. Pearce Enabling new speech driven services for mobile devices: an overview of the proposed etsi standard for a distributed speech recognition front-end , 1999 .

[21]  W. Chou Discriminant-function-based minimum recognition error rate pattern-recognition approach to speech recognition , 2000, Proc. IEEE.