论文信息 - Model-based compressive sensing for multi-party distant speech recognition

Model-based compressive sensing for multi-party distant speech recognition

We leverage the recent algorithmic advances in compressive sensing, and propose a novel source separation algorithm for efficient recovery of convolutive speech mixtures in spectro-temporal domain. Compared to the common sparse component analysis techniques, our approach fully exploits structured sparsity models to obtain substantial improvement over the existing state-of-the-art. We evaluate our method for separation and recognition of a target speaker in a multi-party scenario. Our results provide compelling evidence of the effectiveness of sparse recovery formulations in speech recognition.

Volkan Cevher | Hervé Bourlard | Afsaneh Asaei

[1] Scott Rickard,et al. Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[2] Rémi Gribonval,et al. A survey of Sparse Component Analysis for blind source separation: principles, perspectives, and new challenges , 2006, ESANN.

[3] Shahrokh Valaee,et al. Multiple Target Localization Using Compressive Sensing , 2009, GLOBECOM 2009 - 2009 IEEE Global Telecommunications Conference.

[4] Martin J. McKeown,et al. Underdetermined Anechoic Blind Source Separation via $\ell^{q}$-Basis-Pursuit With $q≪1$ , 2007, IEEE Transactions on Signal Processing.

[5] Michael Zibulevsky,et al. Underdetermined blind source separation using sparse representations , 2001, Signal Process..

[6] Volkan Cevher,et al. Near-optimal Bayesian localization via incoherence and sparsity , 2009, 2009 International Conference on Information Processing in Sensor Networks.

[7] Jont B. Allen,et al. Image method for efficiently simulating small‐room acoustics , 1976 .

[8] Barak A. Pearlmutter,et al. Soft-LOST: EM on a Mixture of Oriented Lines , 2004, ICA.

[9] Hervé Bourlard,et al. Sparse component analysis for speech recognition in multi-speaker environment , 2010, INTERSPEECH.

[10] Volkan Cevher,et al. An ALPS view of sparse recovery , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11] David Pearce,et al. The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[12] Andreas Stolcke,et al. Observations on overlap: findings and implications for automatic processing of multi-party conversation , 2001, INTERSPEECH.

[13] Volkan Cevher,et al. Model-Based Compressive Sensing , 2008, IEEE Transactions on Information Theory.