Majorization-Minimization Algorithm for Discriminative Non-Negative Matrix Factorization

This paper proposes a basis training algorithm for discriminative non-negative matrix factorization (NMF) with applications to single-channel audio source separation. With an NMF-based approach to supervised audio source separation, NMF is first applied to train the basis spectra of each source using training examples and then applied to the spectrogram of a mixture signal using the pretrained basis spectra at test time. The source signals can then be separated out using a Wiener filter. Here, a typical way to train the basis spectra is to minimize the dissimilarity measure between the observed spectrogram and the NMF model. However, obtaining the basis spectra in this way does not ensure that the separated signal will be optimal at test time due to the inconsistency between the objective functions for training and separation (Wiener filtering). To address this mismatch, a framework called discriminative NMF (DNMF) has recently been proposed. While this framework is noteworthy in that it uses a common objective function for training and separation, the objective function becomes more analytically complex than that of regular NMF. In the original DNMF work, a multiplicative update algorithm was proposed for the basis training; however, the convergence of the algorithm is not guaranteed and can be very slow. To overcome this weakness, this paper proposes a convergence-guaranteed algorithm for DNMF based on a majorization-minimization principle. Experimental results show that the proposed algorithm outperform the conventional DNMF algorithm as well as the regular NMF algorithm in terms of both the signal-to-distortion and signal-to-interference ratios.

[1]  Zhongfu Ye,et al.  Learning a Discriminative Dictionary for Single-Channel Speech Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  Antoine Liutkus,et al.  The 2016 Signal Separation Evaluation Campaign , 2017, LVA/ICA.

[3]  Zhuo Chen,et al.  Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Paris Smaragdis,et al.  Deep learning for monaural speech separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Hirokazu Kameoka Non-negative Matrix Factorization and Its Variants for Audio Signal Processing , 2016 .

[7]  Li Li,et al.  Discriminative non-negative matrix factorization with majorization-minimization , 2017, 2017 Hands-free Speech Communications and Microphone Arrays (HSCMA).

[8]  Bhiksha Raj,et al.  Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.

[9]  Jonathan Le Roux,et al.  Discriminative NMF and its application to single-channel source separation , 2014, INTERSPEECH.

[10]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[11]  H. Kameoka,et al.  Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with β-divergence , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[12]  Gert Cauwenberghs,et al.  Monaural separation of independent acoustical components , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[13]  Pascal Vincent,et al.  Discriminative Non-negative Matrix Factorization for Multiple Pitch Estimation , 2012, ISMIR.

[14]  Nam Soo Kim,et al.  Target Source Separation Based on Discriminative Nonnegative Matrix Factorization Incorporating Cross-Reconstruction Error , 2015, IEICE Trans. Inf. Syst..

[15]  Hakan Erdogan,et al.  Discriminative nonnegative dictionary learning using cross-coherence penalties for single channel source separation , 2013, INTERSPEECH.

[16]  Jesper Jensen,et al.  Permutation invariant training of deep models for speaker-independent multi-talker speech separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[18]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[19]  Michael I. Jordan,et al.  Blind One-microphone Speech Separation: A Spectral Learning Approach , 2004, NIPS.

[20]  Zi Wang,et al.  Discriminative non-negative matrix factorization for single-channel speech separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Guillermo Sapiro,et al.  Supervised non-euclidean sparse NMF via bilevel optimization with applications to speech enhancement , 2014, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA).

[22]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[23]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.