Multi-channel speech enhancement using sparse coding on local time-frequency structures

A novel multi-channel speech enhancement technique is proposed in the present paper. We focus on the local sparsities of speech signals in contrast to the conventional beamforming and blind source seperation methods. The technique utilizes the difference of local structures in temporary-frequency domain between the target speech and interfering signals for enhancing the target speech. We first estimate the local structures of the speech and noise signals at each time-frequency bin to form a local dictionary, and then recover the clean speech via sparse coding. The proposed algorithm is simple to implement and requires no prior knowledge of speech and noise. Our experimental evaluations demonstrate that the proposed method can suppress interferer and meantime preserve target speech more than some conventional methods.

[1]  Joachim M. Buhmann,et al.  Speech Enhancement Using Generative Dictionary Learning , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  John W. McDonough,et al.  Adaptive Beamforming With a Minimum Mutual Information Criterion , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Heping Ding,et al.  Combining Superdirective Beamforming and Frequency-Domain Blind Source Separation for Highly Reverberant Signals , 2010, EURASIP J. Audio Speech Music. Process..

[4]  Tuomas Virtanen,et al.  Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  L. J. Griffiths,et al.  An alternative approach to linearly constrained adaptive beamforming , 1982 .

[6]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Daniel Gatica-Perez,et al.  Speech Enhancement and Recognition in Meetings With an Audio–Visual Sensor Array , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Walter Kellermann,et al.  An Acoustic Human-Machine Front-End for Multimedia Applications , 2003, EURASIP J. Adv. Signal Process..

[9]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[10]  Sridha Sridharan,et al.  Near-field Adaptive Beamformer for Robust Speech Recognition , 2002, Digit. Signal Process..

[11]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[12]  Jack Xin,et al.  Multi-Channel $l_{1}$ Regularized Convex Speech Enhancement Model and Fast Computation by the Split Bregman Method , 2012, IEEE Transactions on Audio, Speech, and Language Processing.