Alignment-based codeword-dependent cepstral normalization

This paper proposes the alignment-based codeword dependent cepstral normalization algorithm (ACDC/sub N/) which aims to alleviate the acoustical mismatch that occurs when the speech recognizer faces environmental conditions not observed in the training data. ACDC/sub N/ is based on the linear channel model of the environment originally proposed by Acero (1990) and on the CDCN solution to this model. ACDC/sub N/ substitutes the codebook (Gaussian mixture model) employed by CDCN for the state distributions employed by the recognizer's HMMs under the assumption that these HMM distributions will model the associated speech segments better than the general GMM distribution. The feature-frame to HMM-state association is obtained through an alignment of a first decoding-pass hypothesis. From this alignment, ACDC/sub N/ obtains an estimate of the environmental parameters (noise and channel vectors) which are then employed to obtain an MMSE estimate of the clean speech vectors, in a way similar to Aero's method. ACDC/sub N/ produces an overall reduction of the error rate of over 30 % in the noise range of 0 to 20 dB on experiments conducted on the Aurora-2 noisy digits database.

[1]  George Saon,et al.  Robust digit recognition in noisy environments: the IBM Aurora 2 system , 2001, INTERSPEECH.

[2]  Mazin G. Rahim,et al.  Integrated bias removal techniques for robust speech recognition , 1999, Comput. Speech Lang..

[3]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[4]  Li Deng,et al.  HMM adaptation using vector taylor series for noisy speech recognition , 2000, INTERSPEECH.

[5]  George Saon,et al.  Maximum likelihood discriminant feature spaces , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Pedro J. Moreno,et al.  Speech recognition in noisy environments , 1996 .

[7]  Mark J. F. Gales,et al.  Robust speech recognition in additive and convolutional noise using parallel model combination , 1995, Comput. Speech Lang..

[8]  Chin-Hui Lee,et al.  A maximum-likelihood approach to stochastic matching for robust speech recognition , 1996, IEEE Trans. Speech Audio Process..

[9]  Fu-Hua Liu,et al.  Environmental adaptation for robust speech recognition , 1995 .

[10]  Alejandro Acero,et al.  Acoustical and environmental robustness in automatic speech recognition , 1991 .

[11]  Bhaskar D. Rao,et al.  Techniques for capturing temporal variations in speech signals with fixed-rate processing , 1998, ICSLP.

[12]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[13]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..