Correlated Tensor Factorization for Audio Source Separation

This paper presents an ultimate extension of nonnegative matrix factorization (NMF) for audio source separation based on full covariance modeling over all the time-frequency (TF) bins of the complex spectrogram of an observed mixture signal. Although NMF has been widely used for decomposing an observed power spectrogram in a TF-wise manner, it has a critical limitation that the phase values of interdependent TF bins cannot be dealt with. This problem has been solved only partially by several phase-aware extensions of NMF that decompose an observed complex spectrogram in an time-and/or frequency-wise manner. In this paper, we propose correlated tensor factorization (CTF) that approximates the full covariance matrix over all TF bins as the sum of the Kronecker products between basis covariance matrices over frequency bands and the corresponding ones over time frames. All the TF bins of the complex spectrogram of each source signal are estimated jointly in an interdependent manner via Wiener filtering. We discuss how to reduce the computational cost of CTF and report the results of comparative evaluation of CTF with its special cases such as NMF and positive semidefinite tensor factorization (PSDTF).

[1]  M. Congedo,et al.  Approximate Joint Diagonalization and Geometric Mean of Symmetric Positive Definite Matrices , 2015, PloS one.

[2]  Philippe Depalle,et al.  Phase constrained complex NMF: Separating overlapping partials in mixtures of harmonic musical sources , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[4]  Hirokazu Kameoka,et al.  Determined Blind Source Separation Unifying Independent Vector Analysis and Nonnegative Matrix Factorization , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  Mark D. Plumbley,et al.  Multichannel High-Resolution NMF for Modeling Convolutive Mixtures of Non-Stationary Signals in the Time-Frequency Domain , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Roland Badeau Gaussian modeling of mixtures of non-stationary signals in the Time-Frequency domain (HR-NMF) , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[7]  Eric Moulines,et al.  A blind source separation technique using second-order statistics , 1997, IEEE Trans. Signal Process..

[8]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[9]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[10]  Masataka Goto,et al.  Infinite Positive Semidefinite Tensor Factorization for Source Separation of Mixture Signals , 2013, ICML.

[11]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Schuster,et al.  Separation of a mixture of independent signals using time delayed correlations. , 1994, Physical review letters.

[13]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[14]  Hirokazu Kameoka,et al.  A majorization-minimization algorithm with projected gradient updates for time-domain spectrogram factorization , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Jonathan Le Roux,et al.  Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction , 2008, SAPA@INTERSPEECH.

[16]  Hirokazu Kameoka Multi-resolution signal decomposition with time-domain spectrogram factorization , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[18]  Roland Badeau,et al.  Complex NMF under phase constraints based on signal modeling: Application to audio source separation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Hirokazu Kameoka,et al.  Complex NMF: A new sparse representation for acoustic signals , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Inderjit S. Dhillon,et al.  Low-Rank Kernel Learning with Bregman Matrix Divergences , 2009, J. Mach. Learn. Res..

[21]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[22]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[23]  Masataka Goto,et al.  Beyond NMF: Time-Domain Audio Source Separation without Phase Reconstruction , 2013, ISMIR.

[24]  Ivan V. Oseledets,et al.  Tensor methods and recommender systems , 2016, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[25]  安藤 毅 Topics on operator inequalities , 1978 .

[26]  Hirokazu Kameoka,et al.  Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Hirokazu Kameoka,et al.  Complex NMF with the generalized Kullback-Leibler divergence , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Meir Feder,et al.  Multi-channel signal separation by decorrelation , 1993, IEEE Trans. Speech Audio Process..

[29]  Chi-Kwong Li Geometric Means , 2003 .

[30]  H. Kameoka,et al.  Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with β-divergence , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.