Independent Low-Rank Tensor Analysis for Audio Source Separation

This paper describes a versatile tensor factorization technique called independent low-rank tensor analysis (ILRTA) and its application to single-channel audio source separation. In general, audio source separation has been conducted in the short-time Fourier transform (STFT) domain under an unrealistic but conventional assumption of the independence of time-frequency (TF) bins. Nonnegative matrix factorization (NMF) is a typical technique of single-channel source separation based on the low-rankness of source spectrograms. In a multichannel setting, independent component analysis (ICA) and its multivariate extension called independent vector analysis (IVA) have often been used for blind source separation based on the independence of source spectrograms. Integrating NMF and IVA, independent low-rank matrix analysis (ILRMA) was recently proposed. To deal with the covariance of TF bins, in this paper we propose ILRTA as a new extension of NMF. Both ILRMA and ILRTA aim to find independent and low-rank sources. A key difference is that while ILRMA estimates demixing filters that decorrelate the channels for multichannel source separation, ILRTA finds optimal transforms that decorrelate the time frames and frequency bins of a STFT representation for single-channel source separation in a way that the bin-wise independence assumed by NMF holds true as much as possible. We report evaluation results of ILRTA and discuss extension of ILRTA to multichannel source separation.

[1]  Herwig Wendt,et al.  Nonnegative Matrix Factorization with Transform Learning , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Nobutaka Ono,et al.  Stable and fast update rules for independent vector analysis based on auxiliary function technique , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[3]  Hirokazu Kameoka,et al.  Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Birger Kollmeier,et al.  On the use of spectro-temporal features for the IEEE AASP challenge ‘detection and classification of acoustic scenes and events’ , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[5]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[6]  Jonathan Le Roux,et al.  Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction , 2008, SAPA@INTERSPEECH.

[7]  M. Congedo,et al.  Approximate Joint Diagonalization and Geometric Mean of Symmetric Positive Definite Matrices , 2015, PloS one.

[8]  Masataka Goto,et al.  Beyond NMF: Time-Domain Audio Source Separation without Phase Reconstruction , 2013, ISMIR.

[9]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[10]  Te-Won Lee,et al.  Independent Vector Analysis: An Extension of ICA to Multivariate Components , 2006, ICA.

[11]  Eric Moulines,et al.  A blind source separation technique using second-order statistics , 1997, IEEE Trans. Signal Process..

[12]  Kazuyoshi Yoshii,et al.  Correlated Tensor Factorization for Audio Source Separation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Schuster,et al.  Separation of a mixture of independent signals using time delayed correlations. , 1994, Physical review letters.

[14]  Chi-Kwong Li Geometric Means , 2003 .

[15]  H. Kameoka,et al.  Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with β-divergence , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[16]  Anssi Klapuri,et al.  Automatic music transcription: challenges and future directions , 2013, Journal of Intelligent Information Systems.

[17]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[18]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[19]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  安藤 毅 Topics on operator inequalities , 1978 .

[21]  Masataka Goto,et al.  Infinite Positive Semidefinite Tensor Factorization for Source Separation of Mixture Signals , 2013, ICML.

[22]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[23]  Hirokazu Kameoka,et al.  Determined Blind Source Separation Unifying Independent Vector Analysis and Nonnegative Matrix Factorization , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[24]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[25]  Jon Barker,et al.  The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[26]  Inderjit S. Dhillon,et al.  Low-Rank Kernel Learning with Bregman Matrix Divergences , 2009, J. Mach. Learn. Res..

[27]  Meir Feder,et al.  Multi-channel signal separation by decorrelation , 1993, IEEE Trans. Speech Audio Process..

[28]  R. Bhatia Positive Definite Matrices , 2007 .