Spectral Dynamics for Speech Recognition Under Adverse Conditions

Significant improvements in automatic speech recognition performance have been obtained through front-end feature representations which exploit the time varying properties of speech spectra. Various techniques have been developed to incorporate “spectral dynamics” into the speech representation, including temporal derivative features, spectral mean normalization and, more generally, spectral parameter filtering. This chapter describes the implementation and interrelationships of these techniques and illustrates their use in automatic speech recognition under different types of adverse conditions.

[1]  S. W. Beet,et al.  Visual representations of speech signals , 1993 .

[2]  Chin-Hui Lee,et al.  A maximum-likelihood approach to stochastic matching for robust speech recognition , 1996, IEEE Trans. Speech Audio Process..

[3]  S. Furui,et al.  Speaker-independent isolated word recognition based on emphasized spectral dynamics , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  John H. L. Hansen,et al.  Analysis and compensation of stressed and noisy speech with application to robust automatic recognition , 1988 .

[5]  Alejandro Acero,et al.  Acoustical and environmental robustness in automatic speech recognition , 1991 .

[6]  Yunxin Zhao Iterative self-learning speaker and channel adaptation under various initial conditions , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7]  S. Furui On the use of hierarchical spectral dynamics in speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[8]  Hugo Van hamme,et al.  Comparison of acoustic features and robustness tests of a real-time recogniser using a hardware telephone line simulator , 1994, ICSLP.

[9]  Biing-Hwang Juang,et al.  Signal restoration by spectral mapping , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Hideki Kawahara,et al.  A dynamic cepstrum incorporating time-frequency masking and its application to continuous speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[12]  Hynek Hermansky,et al.  Low-dimensional representation of vowels based on all-pole modeling in the psychophysical domain , 1985, Speech Commun..

[13]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[14]  Ted H. Applebaum,et al.  Tradeoffs in the design of regression features for word recognition , 1991, EUROSPEECH.

[15]  Hsiao-Chuan Wang,et al.  A study of the two-dimensional cepstrum approach for speech recognition , 1992 .

[16]  Ronald A. Cole,et al.  English alphabet recognition with telephone speech , 1991, EUROSPEECH.

[17]  J. M. Pickett,et al.  Effects of Vocal Force on the Intelligibility of Speech Sounds , 1956 .

[18]  John J. Dreher,et al.  Effects of ambient noise on speaker intelligibility of words and phrases , 1957, The Laryngoscope.

[19]  Ted H. Applebaum,et al.  Subband or cepstral domain filtering for recognition of Lombard and channel-distorted speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Brian A. Hanson,et al.  Spectral slope distance measures with linear prediction analysis for word recognition in noise , 1987, IEEE Trans. Acoust. Speech Signal Process..

[21]  Steve Young,et al.  Noisy speech recognition using hidden Markov model state-based filtering , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[22]  Mei-Yuh Hwang,et al.  The SPHINX-II speech recognition system: an overview , 1993, Comput. Speech Lang..

[23]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[24]  Vishwa Gupta,et al.  Integration of acoustic information in a large vocabulary word recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Jean-François Mari,et al.  An N-best strategy, dynamic grammars and selectively trained neural networks for real-time recognition of continuously spelled names over the telephone , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[26]  Dirk Van Compernolle,et al.  In search for the relevant parameters for speaker independent speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[28]  C. Lefebvre,et al.  A comparison of several acoustic representations for speech recognition with degraded and undegraded speech , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[29]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[30]  M. Studdert-Kennedy,et al.  On the role of formant transitions in vowel recognition. , 1967, The Journal of the Acoustical Society of America.

[31]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[32]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[33]  Dirk Van Compernolle,et al.  On the importance of the microphone position for speech recognition in the car , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[34]  D. Dubois Comparison of time-dependent acoustic features for a speaker-independent speech recognition system , 1991, EUROSPEECH.

[35]  Frank K. Soong,et al.  A frequency-weighted Itakura spectral distortion measure and its application to speech recognition in noise , 1988, IEEE Trans. Acoust. Speech Signal Process..

[36]  Victor Zue,et al.  A study of speech recognition system robustness to microphone variations: experiments in phonetic classification , 1994, ICSLP.

[37]  Hynek Hermansky,et al.  Compensation for the effect of the communication channel in auditory-like analysis of speech (RASTA-PLP) , 1991, EUROSPEECH.

[38]  Biing-Hwang Juang,et al.  On the application of hidden Markov models for enhancing noisy speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[39]  Saeed Vaseghi,et al.  Speech modelling using cepstral-time feature matrices and hidden Markov models , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[40]  Aaron E. Rosenberg,et al.  Improved Acoustic Modeling for Continuous Speech Recognition , 1990, HLT.

[41]  R. H. Bernacki,et al.  Effects of noise on speech production: acoustic and perceptual analyses. , 1988, The Journal of the Acoustical Society of America.

[42]  T. Houtgast,et al.  Predicting speech intelligibility in rooms from the modulation transfer function, I. General room acoustics , 1980 .

[43]  Mitch Weintraub,et al.  Reduced Channel Dependence for Speech Recognition , 1992, HLT.

[44]  Aaron E. Rosenberg,et al.  Cepstral channel normalization techniques for HMM-based speaker verification , 1994, ICSLP.

[45]  Yifan Gong,et al.  Speech recognition in noisy environments: A survey , 1995, Speech Commun..

[46]  Hermann Dr Ney,et al.  Experiments on mixture-density phoneme-modelling for the speaker-independent 1000-word speech recognition DARPA task , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[47]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[48]  Richard M. Stern,et al.  Environment normalization for robust speech recognition using direct cepstral comparison , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[49]  Biing-Hwang Juang,et al.  A family of distortion measures based upon projection operation for robust speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[50]  Aaron E. Rosenberg,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[51]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[52]  Chin-Hui Lee,et al.  Connected digit recognition based on improved acoustic resolution , 1993, Comput. Speech Lang..

[53]  Mats Blomberg,et al.  Effects of emphasizing transitional or stationary parts of the speech signal in a discrete utterance recognition system , 1982, ICASSP.

[54]  S. Furui On the role of spectral transition for speech perception. , 1986, The Journal of the Acoustical Society of America.

[55]  Ted H. Applebaum,et al.  Perceptually-based dynamic spectrograms , 1993 .

[56]  Biing-Hwang Juang,et al.  Speech recognition in adverse environments , 1991 .

[57]  Richard M. Stern,et al.  Robust speech recognition by normalization of the acoustic space , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[58]  Joseph Picone,et al.  Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[59]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[60]  Biing-Hwang Juang,et al.  Filtering of spectral parameters for speech recognition , 1994, ICSLP.

[61]  Ted H. Applebaum,et al.  Features for noise-robust speaker-independent word recognition , 1990, ICSLP.

[62]  Ted H. Applebaum,et al.  Features for speaker‐independent recognition of noisy and Lombard speech , 1990 .

[63]  Sadaoki Furui,et al.  Comparison of speaker recognition methods using statistical features and dynamic features , 1981 .

[64]  Brian Hanson,et al.  Robust speaker-independent word recognition using static, dynamic and acceleration features: experiments with Lombard and noisy speech , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[65]  Hans-Günter Hirsch,et al.  Improved speech recognition using high-pass filtering of subband envelopes , 1991, EUROSPEECH.