Efficient SNR Driven SPLICE Implementation for Robust Speech Recognition

The SPLICE algorithm has been recently proposed in the literature to address the robustness issue in Automatic Speech Recognition (ASR). Several variants have been also proposed to improve some drawbacks of the original technique. In this presentation an innovative efficient solution is discussed: it is based on SNR estimation in the frequency or mel domain and investigates the possibility of using different noise types for GMM training in order to maximize the generalization capabilities of the tool and therefore the recognition performances in presence of unknown noise sources. Computer simulations, conducted on the AURORA2 database, seem to confirm the effectiveness of the idea: the proposed approach yields similar accuracy performances w.r.t. the reference one, even employing a simpler mismatch compensation paradigm which does not need any a-priori knowledge on the noises used in the training phase.

[1]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[2]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[3]  Li Deng,et al.  Efficient on-line acoustic environment estimation for FCDCN in a continuous speech recognition system , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  Mark J. F. Gales,et al.  Model-based techniques for noise robust speech recognition , 1995 .

[5]  Yu Hu,et al.  HMM-based pseudo-clean speech synthesis for splice algorithm , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Li Deng,et al.  Large-vocabulary speech recognition under adverse acoustic environments , 2000, INTERSPEECH.

[7]  Francesco Piazza,et al.  Multichannel Cepstral Domain Feature Warping for Robust Speech Recognition , 2010, WIRN.

[8]  Richard M. Stern,et al.  A vector Taylor series approach for environment-independent speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Javier Ramírez,et al.  Cepstral domain segmental nonlinear feature transformations for robust speech recognition , 2004, IEEE Signal Processing Letters.

[10]  Francesco Piazza,et al.  Robust speech recognition using feature-domain multi-channel bayesian estimators , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[11]  Yifan Gong,et al.  Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Francesco Piazza,et al.  Multi-channel Feature Enhancement for Robust Speech Recognition , 2011 .

[13]  Francesco Piazza,et al.  Comparative Evaluation of Single-Channel MMSE-Based Noise Reduction Schemes for Speech Recognition , 2010, J. Electr. Comput. Eng..

[14]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[15]  Alex Acero,et al.  Maximum mutual information SPLICE transform for seen and unseen conditions , 2005, INTERSPEECH.

[16]  Li Deng,et al.  High-performance robust speech recognition using stereo training data , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).