Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings

We tackle the multi-party speech recovery problem through modeling the acoustic of the reverberant chambers. Our approach exploits structured sparsity models to perform room modeling and speech recovery. We propose a scheme for characterizing the room acoustic from the unknown competing speech sources relying on localization of the early images of the speakers by sparse approximation of the spatial spectra of the virtual sources in a free-space model. The images are then clustered exploiting the low-rank structure of the spectro-temporal components belonging to each source. This enables us to identify the early support of the room impulse response function and its unique map to the room geometry. To further tackle the ambiguity of the reflection ratios, we propose a novel formulation of the reverberation model and estimate the absorption coefficients through a convex optimization exploiting joint sparsity model formulated upon spatio-spectral sparsity of concurrent speech representation. The acoustic parameters are then incorporated for separating individual speech signals through either structured sparse recovery or inverse filtering the acoustic channels. The experiments conducted on real data recordings demonstrate the effectiveness of the proposed approach for multi-party speech recovery and recognition.

[1]  Diego H. Milone,et al.  Perceptual evaluation of blind source separation for robust speech recognition , 2008, Signal Process..

[2]  Cha Zhang,et al.  L1 regularized room modeling with compact microphone arrays , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[4]  T. Nakatani,et al.  Mathematical analysis of speech dereverberation based on time-varying Gaussian source model: Its solution and convergence characteristics , 2011, 2011 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC).

[5]  Mike E. Davies,et al.  A New Framework for Underdetermined Speech Extraction Using Mixture of Beamformers , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Takuya Yoshioka,et al.  Blind Separation and Dereverberation of Speech Mixtures by Joint Optimization , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Emmanuel Vincent,et al.  A General Flexible Framework for the Handling of Prior Information in Audio Source Separation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Emanuel A. P. Habets,et al.  Speech Dereverberation Using Statistical Reverberation Models , 2010, Speech Dereverberation.

[9]  Pierre Vandergheynst,et al.  Compressed Sensing of Simultaneous Low-Rank and Joint-Sparse Matrices , 2012, ArXiv.

[10]  Andreas Stolcke,et al.  Observations on overlap: findings and implications for automatic processing of multi-party conversation , 2001, INTERSPEECH.

[11]  Jacob Benesty,et al.  A class of frequency-domain adaptive approaches to blind multichannel identification , 2003, IEEE Trans. Signal Process..

[12]  Francesco Nesta,et al.  Convolutive Underdetermined Source Separation through Weighted Interleaved ICA and Spatio-temporal Source Correlation , 2012, LVA/ICA.

[13]  Volkan Cevher,et al.  Model-Based Compressive Sensing , 2008, IEEE Transactions on Information Theory.

[14]  Stefano Squartini,et al.  Joint Multichannel Blind Speech Separation and Dereverberation: A Real-Time Algorithmic Implementation , 2010, ICIC.

[15]  Rémi Gribonval,et al.  Harmonic decomposition of audio signals with matching pursuit , 2003, IEEE Trans. Signal Process..

[16]  Rémi Gribonval,et al.  A survey of Sparse Component Analysis for blind source separation: principles, perspectives, and new challenges , 2006, ESANN.

[17]  John McDonough,et al.  Distant Speech Recognition , 2009 .

[18]  Masato Miyoshi,et al.  Inverse filtering of room acoustics , 1988, IEEE Trans. Acoust. Speech Signal Process..

[19]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[20]  Walter Kellermann,et al.  TRINICON-based Blind System Identification with Application to Multiple-Source Localization and Separation , 2007, Blind Speech Separation.

[21]  Martin Vetterli,et al.  Can one hear the shape of a room: The 2-D polygonal case , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Michael Zibulevsky,et al.  Underdetermined blind source separation using sparse representations , 2001, Signal Process..

[23]  Yonina C. Eldar,et al.  Exploiting Statistical Dependencies in Sparse Representations for Signal Recovery , 2010, IEEE Transactions on Signal Processing.

[24]  Hiroshi Sawada,et al.  Blind Speech Separation in a Meeting Situation with Maximum SNR Beamformers , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[25]  T. Kailath,et al.  A least-squares approach to blind channel identification , 1995, IEEE Trans. Signal Process..

[26]  Hervé Bourlard,et al.  Microphone array post-filter based on noise field coherence , 2003, IEEE Trans. Speech Audio Process..

[27]  L. Carin,et al.  On the Relationship Between Compressive Sensing and Random Sensor Arrays , 2009, IEEE Antennas and Propagation Magazine.

[28]  Hiroshi Sawada,et al.  Natural gradient multichannel blind deconvolution and speech separation using causal FIR filters , 2004, IEEE Transactions on Speech and Audio Processing.

[29]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[30]  R. K. Cook,et al.  Measurement of Correlation Coefficients in Reverberant Sound Fields , 1955 .

[31]  Jacob Benesty,et al.  A blind channel identification-based two-stage approach to separation and dereverberation of speech signals in a reverberant environment , 2005, IEEE Transactions on Speech and Audio Processing.

[32]  Michael P. Friedlander,et al.  Probing the Pareto Frontier for Basis Pursuit Solutions , 2008, SIAM J. Sci. Comput..

[33]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[34]  Iain McCowan,et al.  Microphone array speech recognition: experiments on overlapping speech in meetings , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[35]  Daniel P. W. Ellis,et al.  Autoregressive Modeling of Temporal Envelopes , 2007, IEEE Transactions on Signal Processing.

[36]  Andreas Ziehe,et al.  The 2011 Signal Separation Evaluation Campaign (SiSEC2011): - Audio Source Separation - , 2012, LVA/ICA.

[37]  Bin Guo,et al.  Coherence, Compressive Sensing, and Random Sensor Arrays , 2011, IEEE Antennas and Propagation Magazine.

[38]  Te-Won Lee,et al.  Blind Speech Separation , 2007, Blind Speech Separation.

[39]  Stephen J. Wright,et al.  Computational Methods for Sparse Solution of Linear Inverse Problems , 2010, Proceedings of the IEEE.

[40]  Yannick Mahieux,et al.  Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering , 1998, IEEE Trans. Speech Audio Process..

[41]  Mike E. Davies,et al.  Gradient Pursuits , 2008, IEEE Transactions on Signal Processing.

[42]  Eap Emanuël Habets Single- and multi-microphone speech dereverberation using spectral enhancement , 2007 .

[43]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[44]  Martin J. McKeown,et al.  Underdetermined Anechoic Blind Source Separation via $\ell^{q}$-Basis-Pursuit With $q≪1$ , 2007, IEEE Transactions on Signal Processing.

[45]  Volkan Cevher,et al.  Multi-Party Speech Recovery Exploiting Structured Sparsity Models , 2011, INTERSPEECH.

[46]  Barak A. Pearlmutter,et al.  Blind Source Separation by Sparse Decomposition in a Signal Dictionary , 2001, Neural Computation.

[47]  Volkan Cevher,et al.  Model-based compressive sensing for multi-party distant speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[49]  Volkan Cevher,et al.  Recipes on hard thresholding methods , 2011, 2011 4th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).