Robust Technologies towards Automatic Speech Recognition in Car Noise Environments

This paper presents the research on robust automatic speech recognition (ASR) in car noise environments. In the front-end design, speech enhancement technologies are used to suppress the background noise in frequency domain, and then spectrum smoothing is implemented both in time and frequency index to compensate those spectrum components distorted by noise over-reduction. In acoustic model training, we propose to use an immunity learning scheme, in which pre-recorded car noises are artificially added to clean training utterances with different signal-to-noise ratios (SNR) to imitate the in-car environments. After analyzing the SNR and noise spectrum of real in-car utterances, we further refine the immunity training set by adjusting the distribution of SNR and increasing the proportion of training noises that has a similar characteristic. Evaluation results of isolated phrase recognition show that the ASR system with proposed technologies achieves the average error rate reduction (ERR) of 90.68% and 79.08% for artificial car noisy speech and real in-car speech respectively, when compared with the baseline system in which no robust technology is used

[1]  Yoichi Takebayashi,et al.  A robust speech recognition system using word-spotting with noise immunity learning , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Olli Viikki,et al.  A recursive feature vector normalization approach for robust speech recognition in noise , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Kazuya Takeda,et al.  Construction and Evaluation of a Large In-Car Speech Corpus , 2005, IEICE Trans. Inf. Syst..

[4]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[5]  Yifan Gong,et al.  Speech recognition in noisy environments: A survey , 1995, Speech Commun..

[6]  Luca Giulio Brayda,et al.  Sensitivity analysis of noise robustness methods , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Yik-Cheung Tam,et al.  Discriminative auditory features for robust speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Shigeki Sagayama,et al.  Jacobian joint adaptation to noise, channel and vocal tract length , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Li Deng,et al.  Large-vocabulary speech recognition under adverse acoustic environments , 2000, INTERSPEECH.

[10]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[11]  John Makhoul,et al.  Speaker adaptive training: a maximum likelihood approach to speaker normalization , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Richard M. Stern,et al.  A vector Taylor series approach for environment-independent speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[13]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .