Dual-Channel VTS Feature Compensation with Improved Posterior Estimation

The use of dual-microphones is a powerful tool for noise- robust automatic speech recognition (ASR). In particular, it allows the reformulation of classical techniques like vector Taylor series (VTS) feature compensation. In this work, we consider a critical issue of VTS compensation such as posterior computation and propose an alternative way to estimate more accurately these probabilities when VTS is applied to enhance noisy speech captured by dual-microphone mobile devices. Our proposal models the conditional dependence of a noisy secondary channel given a primary one not only to outperform single-channel VTS feature compensation, but also a previous dual-channel VTS approach based on a stacked formulation. This is confirmed by recognition experiments on two different dual-channel extensions of the Aurora-2 corpus. Such extensions emulate the use of a dual-microphone smartphone in close- and far-talk conditions, obtaining our proposal relevant improvements in the latter case.

[1]  Richard M. Stern,et al.  A vector Taylor series approach for environment-independent speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[2]  Friedrich Faubel,et al.  On expectation maximization based channel and noise estimation beyond the vector Taylor series expansion , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Li Deng,et al.  HMM adaptation using vector taylor series for noisy speech recognition , 2000, INTERSPEECH.

[4]  Ángel M. Gómez,et al.  Feature enhancement for robust speech recognition on smartphones with dual-microphone , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[5]  Ángel M. Gómez,et al.  A Deep Neural Network Approach for Missing-Data Mask Estimation on Dual-Microphone Smartphones: Application to Noise-Robust Speech Recognition , 2014, IberSPEECH.

[6]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[7]  Ivan Tashev,et al.  Microphone Array for Headset with Spatial Noise Suppressor , 2005 .

[8]  X. Mestre,et al.  On diagonal loading for minimum variance beamformers , 2003, Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No.03EX795).

[9]  Yifan Gong,et al.  An Overview of Noise-Robust Automatic Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Alex Acero,et al.  Sound capture system and spatial filter for small devices , 2008, INTERSPEECH.

[11]  Ángel M. Gómez,et al.  Dual-channel VTS feature compensation for noise-robust speech recognition on mobile devices , 2017, IET Signal Process..

[12]  Hervé Bourlard,et al.  Improving speech recognition performance of small microphone arrays using missing data techniques , 2002, INTERSPEECH.

[13]  Antonio M. Peinado,et al.  Model-based compensation of the additive noise for continuous speech recognition. experiments using the Aurora II database and tasks , 2001, INTERSPEECH.