Maximum mutual information SPLICE transform for seen and unseen conditions

SPLICE is a front-end technique for automatic speech recognition systems. It is a non-linear feature space transformation meant to increase recognition accuracy. Our previous work has shown how to train SPLICE to perform speech feature enhancement. This paper evaluates a maximum mutual information (MMI) based discriminative training method for SPLICE. Discriminative techniques tend to excel when the training and testing data are similar, and to degrade performance significantly otherwise. This paper explores both cases in detail using the Aurora 2 corpus. The overall recognition accuracy of the MMI-SPLICE system is slightly better than the Advanced Front End standard from ETSI, and much better than previous SPLICE training algorithms. Most notably, it achieves this without explicitly resorting to the standard techniques of environment modeling, noise modeling or spectral subtraction.

[1]  Denis Jouvet,et al.  Evaluation of a noise-robust DSR front-end on Aurora databases , 2002, INTERSPEECH.

[2]  Li Deng,et al.  Analysis and comparison of two speech feature extraction/compensation algorithms , 2005, IEEE Signal Processing Letters.

[3]  Wu Chou,et al.  Minimum classification error linear regression for acoustic model adaptation of continuous density HMMs , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  李幼升,et al.  Ph , 1989 .

[5]  Spyridon Matsoukas,et al.  Minimum phoneme error based heteroscedastic linear discriminant analysis for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Mark Hasegawa-Johnson,et al.  Maximum conditional mutual information projection for speech recognition , 2003, INTERSPEECH.

[7]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Wu Chou,et al.  minimum classification error linear regression for acoustic model adaptation of continuous density HMMS , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[9]  Geoffrey Zweig,et al.  fMPE: discriminatively trained features for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  Qiang Huo,et al.  An environment compensated minimum classification error training approach and its evaluation on Aurora2 database , 2002, INTERSPEECH.

[11]  Andreas G. Andreou,et al.  On Generalizations of Linear Discriminant Analysis , 1996 .

[12]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[13]  Li Deng,et al.  Evaluation of SPLICE on the Aurora 2 and 3 tasks , 2002, INTERSPEECH.