Integration of beamforming and uncertainty-of-observation techniques for robust ASR in multi-source environments

This paper presents a new approach for increasing the robustness of multi-channel automatic speech recognition in noisy and reverberant multi-source environments. The proposed method uses uncertainty propagation techniques to dynamically compensate the speech features and the acoustic models for the observation uncertainty determined at the beamforming stage. We present and analyze two methods that allow integrating classical multi-channel signal processing approaches like delay and sum beamformers or Zelinski-type Wiener filters, with uncertainty-of-observation techniques like uncertainty decoding or modified imputation. An analysis of the results on the PASCAL-CHiME task shows that this approach consistently outperforms conventional beamformers with a minimal increase in computational complexity. The use of dynamic compensation based on observation uncertainty also outperforms conventional static adaptation with no need of adaptation data.

[1]  Ramón Fernández Astudillo,et al.  Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions , 2010, EURASIP J. Audio Speech Music. Process..

[2]  Ramón Fernández Astudillo,et al.  Accounting for the uncertainty of speech estimates in the complex domain for minimum mean square error speech enhancement , 2009, INTERSPEECH.

[3]  Mark J. F. Gales,et al.  Issues with uncertainty decoding for noise robust automatic speech recognition , 2008, Speech Commun..

[4]  Ramón Fernández Astudillo,et al.  An Uncertainty Propagation Approach to Robust ASR Using the ETSI Advanced Front-End , 2010, IEEE Journal of Selected Topics in Signal Processing.

[5]  Ning Ma,et al.  The PASCAL CHiME speech separation and recognition challenge , 2013, Comput. Speech Lang..

[6]  L. J. Griffiths,et al.  An alternative approach to linearly constrained adaptive beamforming , 1982 .

[7]  Li Deng,et al.  Uncertainty decoding with SPLICE for noise robust speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Don H. Johnson,et al.  Array Signal Processing: Concepts and Techniques , 1993 .

[9]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[10]  S. Gannot,et al.  Speech enhancement based on the general transfer function GSC and postfiltering , 2004, IEEE Trans. Speech Audio Process..

[11]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[12]  Li Deng,et al.  Exploiting variances in robust feature extraction based on a parametric model of speech distortion , 2002, INTERSPEECH.

[13]  Ramón Fernández Astudillo Integration of short-time Fourier domain speech enhancement and observation uncertainty techniques for robust automatic speech recognition , 2010 .

[14]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[15]  Li Deng,et al.  Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion , 2005, IEEE Transactions on Speech and Audio Processing.

[16]  Mark J. F. Gales,et al.  Model-based techniques for noise robust speech recognition , 1995 .

[17]  R. Orglmeister,et al.  Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[18]  Mathieu Lagrange,et al.  GMM-based classification from noisy features , 2011 .

[19]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[20]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[21]  R. Zelinski,et al.  A microphone array with adaptive post-filtering for noise reduction in reverberant rooms , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[22]  Ramón Fernández Astudillo,et al.  Use of Missing and Unreliable Data for Audiovisual Speech Recognition , 2011, Robust Speech Recognition of Uncertain or Missing Data.

[23]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[24]  Dorothea Kolossa,et al.  CHiME Challenge: Approaches to Robustness using Beamforming and Uncertainty-of-Observation Techniques , 2011 .

[25]  Ramón Fernández Astudillo,et al.  A MMSE estimator in mel-cepstral domain for robust large vocabulary automatic speech recognition using uncertainty propagation , 2010, INTERSPEECH.

[26]  Yifan Gong,et al.  Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Ramón Fernández Astudillo,et al.  Uncertainty Propagation for Speech Recognition using RASTA Features in Highly Nonstationary Noisy Environments , 2011 .

[28]  Francesco Nesta,et al.  Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction , 2011 .

[29]  Eap Emanuël Habets Single- and multi-microphone speech dereverberation using spectral enhancement , 2007 .

[30]  Ramón Fernández Astudillo,et al.  Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multi-Talker Conditions , 2010 .

[31]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[32]  Reinhold Häb-Umbach Uncertainty Decoding and Conditional Bayesian Estimation , 2011, Robust Speech Recognition of Uncertain or Missing Data.

[33]  Mark J. F. Gales,et al.  Mean and variance adaptation within the MLLR framework , 1996, Comput. Speech Lang..

[34]  Masakiyo Fujimoto,et al.  Speech recognition in the presence of highly non-stationary noise based on spatial, spectral and temporal speech/noise modeling combined with dynamic variance adaptation , 2011 .

[35]  Philip C. Loizou Speaker Verification in Noise Using a Stochastic Version of the Weighted Viterbi Algorithm , 2002 .

[36]  Li Deng,et al.  Front-End, Back-End, and Hybrid Techniques for Noise-Robust Speech Recognition , 2011, Robust Speech Recognition of Uncertain or Missing Data.

[37]  Reinhold Häb-Umbach,et al.  Improved source modeling and predictive classification for channel robust speech recognition , 2006, INTERSPEECH.

[38]  Ramón Fernández Astudillo,et al.  Propagation of Uncertainty Through Multilayer Perceptrons for Robust Automatic Speech Recognition , 2011, INTERSPEECH.

[39]  Ramón Fernández Astudillo,et al.  Integration of beamforming and automatic speech recognition through propagation of the wiener posterior , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).