Modified splice and its extension to non-stereo data for noise robust speech recognition

In this paper, a modification to the training process of the popular SPLICE algorithm has been proposed for noise robust speech recognition. The modification is based on feature correlations, and enables this stereo-based algorithm to improve the performance in all noise conditions, especially in unseen cases. Further, the modified framework is extended to work for non-stereo datasets where clean and noisy training utterances, but not stereo counterparts, are required. Finally, an MLLR-based computationally efficient run-time noise adaptation method in SPLICE framework has been proposed. The modified SPLICE shows 8.6% absolute improvement over SPLICE in Test C of Aurora-2 database, and 2.93% overall. Non-stereo method shows 10.37% and 6.93% absolute improvements over Aurora-2 and Aurora-4 baseline models respectively. Run-time adaptation shows 9.89% absolute improvement in modified framework as compared to SPLICE for Test C, and 4.96% overall w.r.t. standard MLLR adaptation on HMMs.

[1]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[2]  Xiaodong Cui,et al.  Stereo-Based Stochastic Mapping for Robust Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Ángel M. Gómez,et al.  Efficient MMSE Estimation and Uncertainty Processing for Multienvironment Robust Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Takashi Masuko,et al.  Feature enhancement by speaker-normalized splice for robust speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Keikichi Hirose,et al.  Unseen noise robust speech recognition using adaptive piecewise linear transformation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Li Deng,et al.  Uncertainty decoding with SPLICE for noise robust speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Li Deng,et al.  Large-vocabulary speech recognition under adverse acoustic environments , 2000, INTERSPEECH.

[8]  Alex Acero,et al.  Maximum mutual information SPLICE transform for seen and unseen conditions , 2005, INTERSPEECH.

[9]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..