Robust speech recognition with multi-channel codebook dependent cepstral normalization (MCDCN)

We address the issue of speech recognition in the presence of interfering signals, in cases where the signals corrupting the speech are recorded in separate channels. We propose to combine a trivial form of filtering with MCDCN, a multi-channel version of codebook dependent cepstral normalization, where the cepstra of the noise are estimated from the reference signals. We report on recognition experiments in a car where the speech signal is corrupted by radio talks or CD music played by the car speakers. Our approach allows relative word error rate reductions in the range of 70-90% compared to a no-compensation baseline, at a relatively low computational cost.

[1]  Meir Feder,et al.  Multi-channel signal separation by decorrelation , 1993, IEEE Trans. Speech Audio Process..

[2]  Benoît Maison,et al.  A robust high accuracy speech recognition system for mobile applications , 2002, IEEE Trans. Speech Audio Process..

[3]  Alexander H. Waibel,et al.  Model-combination-based acoustic mapping , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  Richard M. Stern,et al.  Environmental robustness in automatic speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Alejandro Acero,et al.  Acoustical and environmental robustness in automatic speech recognition , 1991 .

[6]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[7]  Ramesh A. Gopinath,et al.  Low-Resource Speech Recognition of 500-Word Vocabularies , 2001 .

[8]  B. Widrow,et al.  Adaptive noise cancelling: Principles and applications , 1975 .