Local Gaussian model with source-set constraints in audio source separation

To improve the performance of blind audio source separation of convolutive mixtures, the local Gaussian model (LGM) having full rank covariance matrices proposed by Duong et al. is extended. The previous model basically assumes that all sources contribute to each time-frequency slot, which may fail to capture the characteristic of signals with many intermittent silent periods. A constraint on source sets that contribute to each time-frequency slot is therefore explicitly introduced. This approach can be regarded as a relaxation of the sparsity constraint in the conventional time-frequency mask. The proposed model is jointly optimized among the original local Gaussian model parameters, the relaxed version of the time-frequency mask, and a permutation alignment, leading to a robust permutation-free algorithm. We also present a novel multi-channel Wiener filter weighted by a relaxed version of the time-frequency mask. Experimental results over noisy speech signals show that the proposed model is effective compared with the original local Gaussian model and is comparable to its extension, the multi-channel nonnegative matrix factorization.

[1]  Takeshi Yamada,et al.  Comparison of superimposition and sparse models in blind source separation by multichannel Wiener filter , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[2]  Emmanuel Vincent,et al.  Multichannel Audio Source Separation With Deep Neural Networks , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Emmanuel Vincent,et al.  Multichannel music separation with deep neural networks , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[4]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[5]  Rémi Gribonval,et al.  Under-determined convolutive blind source separation using spatial covariance models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Emmanuel Vincent,et al.  A General Flexible Framework for the Handling of Prior Information in Audio Source Separation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Jon Barker,et al.  The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[9]  Tomohiro Nakatani,et al.  Permutation-free convolutive blind source separation via full-band clustering based on frequency-independent source presence priors , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Alexey Ozerov,et al.  Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[12]  Hirokazu Kameoka,et al.  Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Rémi Gribonval,et al.  Spatial covariance models for under-determined reverberant audio source separation , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[14]  Pierre Vandergheynst,et al.  Nonnegative matrix factorization and spatial covariance model for under-determined reverberant audio source separation , 2010, 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010).

[15]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Rémi Gribonval,et al.  Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Rémi Gribonval,et al.  Underdetermined Instantaneous Audio Source Separation via Local Gaussian Modeling , 2009, ICA.

[18]  J. Cardoso,et al.  Maximum likelihood approach for blind audio source separation using time-frequency Gaussian source models , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[19]  Cnrs Ltci Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures With Application to Blind Audio Source Separation , 2009 .

[20]  Takuya Yoshioka,et al.  Robust MVDR beamforming using time-frequency masks for online/offline ASR in noise , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[22]  Tomohiro Nakatani,et al.  Hybrid approach for multichannel source separation combining time-frequency mask with multi-channel Wiener filter , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).