Underdetermined reverberant acoustic source separation using weighted full-rank nonnegative tensor models.

In this paper, a fusion of K models of full-rank weighted nonnegative tensor factor two-dimensional deconvolution (K-wNTF2D) is proposed to separate the acoustic sources that have been mixed in an underdetermined reverberant environment. The model is adapted in an unsupervised manner under the hybrid framework of the generalized expectation maximization and multiplicative update algorithms. The derivation of the algorithm and the development of proposed full-rank K-wNTF2D will be shown. The algorithm also encodes a set of variable sparsity parameters derived from Gibbs distribution into the K-wNTF2D model. This optimizes each sub-model in K-wNTF2D with the required sparsity to model the time-varying variances of the sources in the spectrogram. In addition, an initialization method is proposed to initialize the parameters in the K-wNTF2D. Experimental results on the underdetermined reverberant mixing environment have shown that the proposed algorithm is effective at separating the mixture with an average signal-to-distortion ratio of 3 dB.

[1]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Mikkel N. Schmidt,et al.  Sparse Non-negative Matrix Factor 2-D Deconvolution , 2006 .

[3]  Corrado Mencar,et al.  Subtractive clustering for seeding non-negative matrix factorizations , 2014, Inf. Sci..

[4]  Mathias Johansson Blind Source Separation , 2014, Encyclopedia of Social Network Analysis and Mining.

[5]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[6]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[7]  Bin Gao,et al.  Single-channel blind separation using L₁-sparse complex non-negative matrix factorization for acoustic signals. , 2015, The Journal of the Acoustical Society of America.

[8]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[9]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[10]  Suresh Venkatasubramanian,et al.  Approximate bregman near neighbors in sublinear time: beyond the triangle inequality , 2012, SoCG '12.

[11]  Emmanuel Vincent,et al.  Benchmarking flexible adaptive time-frequency transforms for underdetermined audio source separation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Rémi Gribonval,et al.  Under-Determined Reverberant Audio Source Separation Using Local Observed Covariance and Auditory-Motivated Time-Frequency Representation , 2010, LVA/ICA.

[14]  W. L. Woo,et al.  Single-Channel Source Separation Using EMD-Subband Variable Regularized Sparse Features , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Emmanuel Vincent,et al.  Variational Bayesian Inference for Source Separation and Robust Feature Extraction , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[17]  W. L. Woo,et al.  Blind Source Separation of Postnonlinear Convolutive Mixture , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Wai Lok Woo,et al.  Nonlinear single channel source separation , 2010, 2010 7th International Symposium on Communication Systems, Networks & Digital Signal Processing (CSNDSP 2010).

[19]  Hirokazu Kameoka,et al.  Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Hirokazu Kameoka,et al.  Underdetermined BSS with multichannel complex NMF assuming W-disjoint orthogonality of source , 2011, TENCON 2011 - 2011 IEEE Region 10 Conference.

[21]  Jen-Tzung Chien,et al.  Adaptive processing and learning for audio source separation , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.

[22]  Wai Lok Woo,et al.  Unsupervised Single-Channel Separation of Nonstationary Signals Using Gammatone Filterbank and Itakura–Saito Nonnegative Matrix Two-Dimensional Factorizations , 2013, IEEE Transactions on Circuits and Systems I: Regular Papers.

[23]  Wai Lok Woo,et al.  Blind Source Separation of Postnonlinear Convolutive Mixture , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Morten Mørup,et al.  Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation , 2006, ICA.

[25]  Wai Lok Woo,et al.  Variational Regularized 2-D Nonnegative Matrix Factorization , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Phillip L. De Leon,et al.  Blind source separation of mixtures of speech signals with unknown propagation delays , 2000 .

[28]  Emmanuel Vincent,et al.  A General Flexible Framework for the Handling of Prior Information in Audio Source Separation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Zhao Li,et al.  An MRC based over-determined blind source separation algorithm , 2010, 21st Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communications.

[30]  Rafal Zdunek,et al.  Improved Convolutive and Under-Determined Blind Audio Source Separation with MRF Smoothing , 2012, Cognitive Computation.

[31]  B. Kollmeier,et al.  Convolutive blind source separation of speech signals based on amplitude modulation decorrelation , 2000 .

[32]  Wai Lok Woo,et al.  Adaptive Sparsity Non-Negative Matrix Factorization for Single-Channel Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[33]  Laurent Girin,et al.  Informed Source Separation of Linear Instantaneous Under-Determined Audio Mixtures by Source Index Embedding , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Hiroshi Sawada,et al.  Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  Michael M. Goodwin,et al.  The STFT, Sinusoidal Models, and Speech Modification , 2008 .

[37]  Rémi Gribonval,et al.  Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[38]  Bin Gao,et al.  Cochleagram-based audio pattern separation using two-dimensional non-negative matrix factorization with automatic sparsity adaptation. , 2014, The Journal of the Acoustical Society of America.