A FLEXIBLE SPATIAL BLIND SOURCE EXTRACTION FRAMEWORK FOR ROBUST SPEECH RECOGNITION IN NOISY ENVIRONMENTS

Blind source extraction (BSE) is an attractive approach to enhance multichannel noisy speech data, as a preprocessing step for an automatic speech recognition system. BSE was successfully applied to the first Chime Pascal Challenge for improving the recognition rate of noisy commands in a small dictionary task. In this work we reviewed the BSE architecture and improved each system block in the framework in order to increase its flexibility and degree of blindness. Two different algorithms are finally implemented to address both Tracks of the 2nd Chime Challenge. To improve the overall performance, the output of the enhancement algorithm is then combined with robust ASR systems based on gammatone features analysis and on uncertainty decoding. Results obtained with different front-end and back-end configurations demonstrate the advantages of the proposed approaches.

[1]  Keith Vertanen Baseline Wsj Acoustic Models for Htk and Sphinx : Training Recipes and Recognition Experiments , 2007 .

[2]  John R. Hershey,et al.  Monaural speech separation and recognition challenge , 2010, Comput. Speech Lang..

[3]  Richard M. Stern,et al.  A vector Taylor series approach for environment-independent speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  Jon Barker,et al.  The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Francesco Nesta,et al.  Underdetermined Source Detection and Separation Using a Normalized Multichannel Spatial Dictionary , 2012, IWAENC.

[6]  Roland Maas,et al.  AT wo-Channel Acoustic Front-End for Robust Automatic Speech Recognition in Noisy and Reverberant Environments , 2011 .

[7]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[8]  Kiyohiro Shikano,et al.  Blind Spatial Subtraction Array for Speech Enhancement in Noisy Environment , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  R. Orglmeister,et al.  Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[10]  Marco Matassoni,et al.  An auditory based modulation spectral feature for reverberant speech recognition , 2010, INTERSPEECH.

[11]  Ramón Fernández Astudillo,et al.  Integration of beamforming and uncertainty-of-observation techniques for robust ASR in multi-source environments , 2013, Comput. Speech Lang..

[12]  Walter Kellermann Some current challenges in multichannel acoustic signal processing , 2006 .

[13]  Jun Du,et al.  A Feature Compensation Approach Using High-Order Vector Taylor Series Approximation of an Explicit Distortion Model for Noisy Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  K. Matsuoka,et al.  Minimal distortion principle for blind source separation , 2002, Proceedings of the 41st SICE Annual Conference. SICE 2002..

[15]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[16]  Francesco Nesta,et al.  Convolutive Underdetermined Source Separation through Weighted Interleaved ICA and Spatio-temporal Source Correlation , 2012, LVA/ICA.

[17]  John McDonough,et al.  Distant Speech Recognition , 2009 .

[18]  Ramón Fernández Astudillo Integration of short-time Fourier domain speech enhancement and observation uncertainty techniques for robust automatic speech recognition , 2010 .

[19]  Li Deng,et al.  Uncertainty decoding with SPLICE for noise robust speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Jan Nouza,et al.  CHiME Data Separation Based on Target Signal Cancellation and Noise Masking , 2011 .

[21]  Francesco Nesta,et al.  Blind source extraction for robust speech recognition in multisource noisy environments , 2013, Comput. Speech Lang..