On the impact of signal preprocessing for robust distant speech recognition in adverse acoustic environments

In this contribution, a two-channel acoustic front-end for robust automatic speech recognition (ASR) in adverse acoustic environments is analyzed. The source signal extraction scheme combines a blocking matrix based on semi-blind source separation, which provides a continuously updated reference of all undesired components separated from the desired signal and its reflections, and a single-channel Wiener postfilter. The postfilter is directly derived from the obtained noise and interference reference signal and hence, generalizes well-known postfilter realizations. The proposed front-end and its integration into an ASR system are analyzed and evaluated with respect to keyword accuracy under reverberant conditions with unpredictable and nonstationary interferences, and for different target source distances. Evaluating a simplified front-end based on free-field assumptions, an ideal front-end, where knowledge about the true undesired components is assumed, and comparing the proposed scheme with the competitive approach of solely using multistyle training, demonstrates the importance of an adequate signal preprocessing for robust distant speech recognition.

[1]  Roland Maas,et al.  AT wo-Channel Acoustic Front-End for Robust Automatic Speech Recognition in Noisy and Reverberant Environments , 2011 .

[2]  Li Deng,et al.  HMM adaptation using vector taylor series for noisy speech recognition , 2000, INTERSPEECH.

[3]  R. Maas,et al.  A Stereophonic Acoustic Front-End for Distant-Talking Interfaces based on Blind Source Separation , 2012 .

[4]  Walter Kellermann,et al.  An acoustic front-end for interactive TV incorporating multichannel acoustic echo cancellation and blind signal extraction , 2010, 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers.

[5]  Walter Kellermann,et al.  TDOA Estimation for Multiple Sound Sources in Noisy and Reverberant Environments Using Broadband Independent Component Analysis , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Hervé Bourlard,et al.  Microphone array post-filter based on noise field coherence , 2003, IEEE Trans. Speech Audio Process..

[7]  Mark J. F. Gales,et al.  The Application of Hidden Markov Models in Speech Recognition , 2007, Found. Trends Signal Process..

[8]  Alexander I. Rudnicky,et al.  Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  Heinrich Kuttruff,et al.  Room acoustics , 1973 .

[10]  YoungSteve,et al.  The application of hidden Markov models in speech recognition , 2007 .

[11]  Olivier Siohan,et al.  Sequential estimation with optimal forgetting for robust speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[12]  Christopher V. Alvino,et al.  Geometric source separation: merging convolutive source separation with geometric beamforming , 2001, Neural Networks for Signal Processing XI: Proceedings of the 2001 IEEE Signal Processing Society Workshop (IEEE Cat. No.01TH8584).

[13]  Walter Kellermann,et al.  BSS for improved interference estimation for Blind speech signal Extraction with two microphones , 2009, 2009 3rd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[14]  Ning Ma,et al.  The PASCAL CHiME speech separation and recognition challenge , 2013, Comput. Speech Lang..

[15]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[16]  Walter Kellermann,et al.  Speech enhancement for binaural hearing aids based on blind source separation , 2010, 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP).

[17]  Walter Kellermann,et al.  A generalization of blind source separation algorithms for convolutive mixtures based on second-order statistics , 2005, IEEE Transactions on Speech and Audio Processing.

[18]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[19]  Satoshi Nakamura,et al.  Noise adaptive speech recognition based on sequential noise parameter estimation , 2004, Speech Commun..