Automatic word recognition in cars

The paper compares, on a database recorded in a car, a number of signal analysis and speech enhancement techniques as well as some approaches to adapt speech recognition systems. It is shown that a new nonlinear spectral subtraction associated with Mel frequency cepstral coefficients (MFCC) is an adequate compromise for low-cost integration. The Lombard effect is analyzed and simulated. Such a simulation is used to derive realistic training utterances from noise-free utterances. Adapting a continuous-density hidden Markov model (CDHMM) to these artificially generated training samples yields a very high performance with respect to that achieved within the ESPRIT adverse environment recognition of speech (ARS) project, i.e., an average of 1% error for all driving conditions. Finally, the paper shows, both theoretically and experimentally, that whatever the noise estimation technique is, it is better to add this noise estimate to the reference clean models than to subtract it from the noisy data. >

[1]  T. Martin,et al.  On the effects of varying filter bank parameters on isolated word recognition , 1982 .

[2]  D. Van Compernolle Increased noise immunity in large vocabulary speech recognition with the aid of spectral subtraction , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Saeed Vaseghi,et al.  Noise-adaptive hidden Markov models based on wiener filters , 1993, EUROSPEECH.

[4]  Yves Grenier A microphone array for car environments , 1993, Speech Commun..

[5]  C. Lefebvre,et al.  A comparison of several acoustic representations for speech recognition with degraded and undegraded speech , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[6]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[7]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[8]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[9]  S. W. Beet Automatic speech recognition using a reduced auditory representation and position-tolerant discrimination , 1990 .

[10]  John H. L. Hansen,et al.  Lombard effect compensation for robust automatic speech recognition in noise , 1990, ICSLP.

[11]  Don H. Johnson,et al.  Reduction of all-pole parameter estimator bias by successive autocorrelation , 1983, ICASSP.

[12]  John H. L. Hansen,et al.  Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect , 1994, IEEE Trans. Speech Audio Process..

[13]  Biing-Hwang Juang,et al.  The short-time modified coherence representation and noisy speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[14]  Mark J. F. Gales,et al.  An improved approach to the hidden Markov model decomposition of speech and noise , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  H. Strube Linear prediction on a warped frequency scale , 1980 .

[16]  Steven Kay,et al.  The effects of noise on the autoregressive spectral estimator , 1979 .

[17]  Jérôme Boudy,et al.  Evaluation of car noise reduction/compensation techniques for digit recognition in a speaker-independent context , 1993, EUROSPEECH.

[18]  Yumi Takizawa,et al.  Lombard speech recognition by formant-frequency-shifted LPC cepstrum , 1990, ICSLP.

[19]  Juan Arturo Nolazco-Flores,et al.  Adapting a HMM-based recogniser for noisy speech enhanced by spectral subtraction , 1993, EUROSPEECH.

[20]  R. H. Bernacki,et al.  Effects of noise on speech production: acoustic and perceptual analyses. , 1988, The Journal of the Acoustical Society of America.

[21]  H. Wakita,et al.  A comparative study of cepstral lifters and distance measures for all pole models of speech in noise , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[22]  Yi-Teh Lee,et al.  Information-theoretic distortion measures for speech recognition , 1991, IEEE Trans. Signal Process..

[23]  Y. Tohkura,et al.  A weighted cepstral distance measure for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Hans Werner Strube,et al.  Linear prediction on a warped frequency scale [speech processing] , 1988, IEEE Trans. Acoust. Speech Signal Process..

[25]  Alan V. Oppenheim,et al.  Discrete-Time Signal Pro-cessing , 1989 .

[26]  Biing-Hwang Juang,et al.  Speech recognition in adverse environments , 1991 .

[27]  G. Chollet,et al.  Evaluating speech recognizers and data bases , 1988 .

[28]  J. Boudy,et al.  Car noise processing for speech input , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[29]  Biing-Hwang Juang,et al.  A family of distortion measures based upon projection operation for robust speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[30]  B. J. Stanton,et al.  Robust recognition of loud and Lombard speech in the fighter cockpit environment , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[31]  Chafic Mokbel,et al.  Word recognition in the car: adapting recognizers to new environments , 1992, ICSLP.

[32]  Gérard Chollet,et al.  Word recognition in the car-speech enhancement/spectral transformations , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[33]  B. Widrow,et al.  Adaptive noise cancelling: Principles and applications , 1975 .

[34]  Yeunung Chen,et al.  Cepstral domain talker stress compensation for robust speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[35]  Biing-Hwang Juang,et al.  On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[36]  Dirk Van Compernolle,et al.  Speech recognition in noisy environments with the aid of microphone arrays , 1989, Speech Commun..

[37]  Oded Ghitza,et al.  Temporal non-place information in the auditory-nerve firing patterns as a front-end for speech recognition in a noisy environment , 1988 .

[38]  Jérôme Boudy,et al.  Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars , 1991, Speech Commun..

[39]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[40]  Patrice Alexandre,et al.  Root cepstral analysis: A unified view. Application to speech processing in car noise environments , 1993, Speech Commun..

[41]  Brian Hanson,et al.  Robust speaker-independent word recognition using static, dynamic and acceleration features: experiments with Lombard and noisy speech , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[42]  L. R. Rabiner,et al.  Improving the quality of a noisy speech signal , 1981, The Bell System Technical Journal.

[43]  T. Martin,et al.  On the effects of varying filter bank parameters on isolated word recognition , 1982 .