Discriminative and reconstructive basis training for audio source separation with semi-supervised nonnegative matrix factorization

This paper addresses an audio source separation problem and proposes a new basis training method for semi-supervised nonnegative matrix factorization (NMF). In a conventional semi-supervised NMF, pretrained spectral bases for a target source can represent other undesired interfering sources, which degrade the separation performance. To solve this problem, we propose the training of two types of supervised bases, discriminative and reconstructive, bases for the target source. In the training stage, the discriminative bases are trained to have unique spectral components of the target source to maximize the discrimination ability from the other sources, whereas the reconstructive bases are trained to represent the complete spectra of the target source. The efficacy of the proposed method is confirmed by performing a semi-supervised music source separation.

[1]  Guillermo Sapiro,et al.  Supervised non-euclidean sparse NMF via bilevel optimization with applications to speech enhancement , 2014, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA).

[2]  Jonathan Le Roux,et al.  Discriminative NMF and its application to single-channel source separation , 2014, INTERSPEECH.

[3]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[4]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[5]  Bhiksha Raj,et al.  Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.

[6]  Hirokazu Kameoka,et al.  Efficient multichannel nonnegative matrix factorization exploiting rank-1 spatial model , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Hirokazu Kameoka,et al.  Determined Blind Source Separation Unifying Independent Vector Analysis and Nonnegative Matrix Factorization , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  Hakan Erdogan,et al.  Discriminative nonnegative dictionary learning using cross-coherence penalties for single channel source separation , 2013, INTERSPEECH.

[9]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Maurice Charbit,et al.  Factorial Scaled Hidden Markov Model for polyphonic audio representation and source separation , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[11]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Kiyohiro Shikano,et al.  Music Signal Separation Based on Supervised Nonnegative Matrix Factorization with Orthogonality and Maximum-Divergence Penalties , 2014, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[13]  Hanwook Chung,et al.  Discriminative Training of NMF Model Based on Class Probabilities for Speech Enhancement , 2016, IEEE Signal Processing Letters.

[14]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[15]  Zi Wang,et al.  Discriminative non-negative matrix factorization for single-channel speech separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Andreas Ziehe,et al.  The 2011 Signal Separation Evaluation Campaign (SiSEC2011): - Audio Source Separation - , 2012, LVA/ICA.

[17]  Hirokazu Kameoka,et al.  Statistical Model of Speech Signals Based on Composite Autoregressive System with Application to Blind Source Separation , 2010, LVA/ICA.

[18]  Hirokazu Kameoka,et al.  Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Shigeki Sagayama,et al.  Multipitch Analysis with Harmonic Nonnegative Matrix Approximation , 2007, ISMIR.

[20]  Hirokazu Kameoka,et al.  Constrained and regularized variants of non-negative matrix factorization incorporating music-specific constraints , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Hirokazu Kameoka,et al.  Multichannel Signal Separation Combining Directional Clustering and Nonnegative Matrix Factorization with Spectrogram Restoration , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[22]  Kiyohiro Shikano,et al.  Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing , 2013, IEEE International Symposium on Signal Processing and Information Technology.

[23]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.