Underdetermined convolutive speech separation method based on channel identification and sparse recovery

In this paper, we consider the problem of separating of speech sources from their underdetermined convolutive mixtures with channel identification and recovery. Our proposed algorithm does not require any prior knowledge of the source geometry or the DOA information. The first step of the proposed algorithm is to estimate the convolutive channel from the speech mixtures after a clustering procedure is implemented to select time interval during which only one source signal is effectively present. The second step is to recover the speech signal based on a compressed sensing (CS) concept to use the sparse structure of the speech signals. Numerical experiments including the comparison with other separation approaches for convolutive speech mixtures are provided to show that our algorithm achieves desirable performance improvement.

[1]  T. Kailath,et al.  A least-squares approach to blind channel identification , 1995, IEEE Trans. Signal Process..

[2]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[3]  Rémi Gribonval,et al.  BSS_EVAL Toolbox User Guide -- Revision 2.0 , 2005 .

[4]  B. Jørgensen Statistical Properties of the Generalized Inverse Gaussian Distribution , 1981 .

[5]  Bhaskar D. Rao,et al.  Extension of SBL Algorithms for the Recovery of Block Sparse Signals With Intra-Block Correlation , 2012, IEEE Transactions on Signal Processing.

[6]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[7]  Peter Vary,et al.  A binaural room impulse response database for the evaluation of dereverberation algorithms , 2009, 2009 16th International Conference on Digital Signal Processing.

[8]  Volkan Cevher,et al.  Computational methods for underdetermined convolutive speech localization and separation via model-based sparse component analysis , 2016, Speech Commun..

[9]  V. G. Reju,et al.  Underdetermined Convolutive Blind Source Separation via Time–Frequency Masking , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Bhaskar D. Rao,et al.  Sparse Bayesian learning for basis selection , 2004, IEEE Transactions on Signal Processing.

[11]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[12]  Aggelos K. Katsaggelos,et al.  Bayesian Compressive Sensing Using Laplace Priors , 2010, IEEE Transactions on Image Processing.

[13]  Emmanuel Vincent,et al.  The 2008 Signal Separation Evaluation Campaign: A Community-Based Approach to Large-Scale Evaluation , 2009, ICA.

[14]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[15]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[16]  Tao Zhang,et al.  Dynamic relative impulse response estimation using structured sparse Bayesian learning , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).