Noise-Adaptive LDA: A New Approach for Speech Recognition Under Observation Uncertainty

Automatic speech recognition (ASR) performance suffers severely from non-stationary noise, precluding widespread use of ASR in natural environments. Recently, so-termed uncertainty-of-observation techniques have helped to recover good performance. These consider the clean speech features as a hidden variable, of which the observable features are only an imperfect estimate. An estimated error variance of features is therefore used to further guide recognition. Based on the same idea, we introduce a new strategy: Reducing the speech feature dimensionality for optimal discriminance under observation uncertainty can yield significantly improved recognition performance, and is derived easily via Fisher's criterion of discriminant analysis.

[1]  Ramón Fernández Astudillo,et al.  A MMSE estimator in mel-cepstral domain for robust large vocabulary automatic speech recognition using uncertainty propagation , 2010, INTERSPEECH.

[2]  Dorothea Kolossa,et al.  Audiovisual speech recognition with missing or unreliable data , 2009, AVSP.

[3]  Reinhold Haeb-Umbach,et al.  Robust Speech Recognition of Uncertain or Missing Data - Theory and Applications , 2011 .

[4]  Ramón Fernández Astudillo,et al.  Integration of beamforming and uncertainty-of-observation techniques for robust ASR in multi-source environments , 2013, Comput. Speech Lang..

[5]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Wonyong Sung,et al.  Parallel scalability in speech recognition , 2009, IEEE Signal Processing Magazine.

[7]  Olivier Siohan,et al.  On the robustness of linear discriminant analysis as a preprocessing step for noisy speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[8]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[9]  Ning Ma,et al.  The PASCAL CHiME speech separation and recognition challenge , 2013, Comput. Speech Lang..

[10]  Li Deng,et al.  Exploiting variances in robust feature extraction based on a parametric model of speech distortion , 2002, INTERSPEECH.

[11]  Kurt Keutzer,et al.  Efficient manycore CHMM speech recognition for audiovisual and multistream data , 2010, INTERSPEECH.

[12]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[13]  László Tóth,et al.  Kernel-based feature extraction with a speech technology application , 2004, IEEE Transactions on Signal Processing.

[14]  Detlev Langmann,et al.  A comparative study of linear feature transformation techniques for automatic speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[15]  Li Deng,et al.  Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion , 2005, IEEE Transactions on Speech and Audio Processing.