Student's T nonnegative matrix factorization and positive semidefinite tensor factorization for single-channel audio source separation

This paper presents a robust variant of nonnegative matrix factorization (NMF) based on complex Student's t distributions (t-NMF) for source separation of single-channel audio signals. The Itakura-Saito divergence NMF (Gaussian NMF) is justified for this purpose under an assumption that the complex spectra of source signals and those of the mixture signal are complex Gaussian distributed (the additiv-ity of power spectra holds). In fact, however, the source spectra are often heavy-tailed distributed. When the source spectra are complex Cauchy distributed, for example, the mixture spectra are also complex Cauchy distributed (the additivity of amplitude spectra holds). Using the complex t distribution that includes the complex Gaussian and Cauchy distributions as its special cases, we propose t-NMF as a unified extension of Gaussian NMF and Cauchy NMF. Furthermore, we propose the corresponding variant of positive semidefinite tensor factorization based on multivariate complex t distributions (t-PSDTF). The experimental results showed that while t-NMF and t-PSDTF were comparative to Gaussian counterparts in terms of peak performance, they worked much better on average because they are insensitive to initialization and tend to avoid local optima.

[1]  Derry Fitzgerald,et al.  On the use of the beta divergence for musical source separation , 2009 .

[2]  Dalia El Badawy,et al.  Relative group sparsity for non-negative matrix factorization with application to on-the-fly audio source separation , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  H. Kameoka,et al.  Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with β-divergence , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[4]  Antoine Liutkus,et al.  Cauchy nonnegative matrix factorization , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[5]  Inderjit S. Dhillon,et al.  Low-Rank Kernel Learning with Bregman Matrix Divergences , 2009, J. Mach. Learn. Res..

[6]  Masataka Goto,et al.  Infinite Positive Semidefinite Tensor Factorization for Source Separation of Mixture Signals , 2013, ICML.

[7]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[8]  Hirokazu Kameoka,et al.  Consistent Wiener Filtering: Generalized Time-Frequency Masking Respecting Spectrogram Consistency , 2010, LVA/ICA.

[9]  Paris Smaragdis,et al.  Static and Dynamic Source Separation Using Nonnegative Factorizations: A unified view , 2014, IEEE Signal Processing Magazine.

[10]  Alexander Lerch,et al.  Drum transcription using partially fixed non-negative matrix factorization , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[11]  Frédéric Bimbot,et al.  Music separation guided by cover tracks: Designing the joint NMF model , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[13]  Antoine Liutkus,et al.  Alpha-Stable Matrix Factorization , 2015, IEEE Signal Processing Letters.

[14]  Mark D. Plumbley,et al.  Non-negative matrix factorisation incorporating greedy hellinger sparse coding applied to polyphonic music transcription , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Masataka Goto,et al.  Beyond NMF: Time-Domain Audio Source Separation without Phase Reconstruction , 2013, ISMIR.

[16]  Paris Smaragdis,et al.  Optimal cost function and magnitude power for NMF-based speech separation and music interpolation , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[17]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[18]  Chrysostomos L. Nikias,et al.  Spectral methods for stationary harmonizable alpha-stable processes , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[19]  Hirokazu Kameoka,et al.  Efficient algorithms for multichannel extensions of Itakura-Saito nonnegative matrix factorization , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Daniel P. W. Ellis,et al.  Beta Process Sparse Nonnegative Matrix Factorization for Music , 2013, ISMIR.

[21]  M. Taqqu,et al.  Stable Non-Gaussian Random Processes : Stochastic Models with Infinite Variance , 1995 .

[22]  Ali Taylan Cemgil,et al.  Bayesian Inference for Nonnegative Matrix Factorisation Models , 2009, Comput. Intell. Neurosci..

[23]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Antoine Liutkus,et al.  Generalized Wiener filtering with fractional power spectrograms , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Antoine Liutkus,et al.  Coding-Based Informed Source Separation: Nonnegative Tensor Factorization Approach , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Nicolás Ruiz-Reyes,et al.  Percussive/harmonic sound separation by non-negative matrix factorization with smoothness/sparseness constraints , 2014, EURASIP J. Audio Speech Music. Process..

[27]  Perry R. Cook,et al.  Bayesian Nonparametric Matrix Factorization for Recorded Music , 2010, ICML.