Discriminative non-negative matrix factorization for single-channel speech separation

Non-negative matrix factorization (NMF) has emerged as a promising approach for single-channel speech separation. In this paper, we propose a new method of discriminative learning of NMF. In contrast to conventional approaches where the basis vectors are learned independently on clean signals from each speaker, our approach optimizes all basis vectors jointly to reconstruct both clean signals and mixed signals well. Our empirical studies validated our approach. Specifically, discriminative NMF outperforms standard methods by a large margin in improving signal-to-noise ratio for reconstructing signals.

[1]  Yoshitaka Nakajima,et al.  Auditory Scene Analysis: The Perceptual Organization of Sound Albert S. Bregman , 1992 .

[2]  G. Kramer Auditory Scene Analysis: The Perceptual Organization of Sound by Albert Bregman (review) , 2016 .

[3]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[4]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[5]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[6]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[7]  J. Eggert,et al.  Sparse coding and NMF , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[8]  Daniel P. W. Ellis,et al.  Model-Based Monaural Source Separation Using a Vector-Quantized Phase-Vocoder Representation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[10]  Mikkel N. Schmidt,et al.  Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[11]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Hakan Erdogan,et al.  Single Channel Speech Music Separation Using Nonnegative Matrix Factorization with Sliding Windows and Spectral Masks , 2011, INTERSPEECH.

[13]  Pascal Vincent,et al.  Discriminative Non-negative Matrix Factorization for Multiple Pitch Estimation , 2012, ISMIR.

[14]  Björn Schuller,et al.  Semi-Supervised Suppression of Background Music in Monaural Speech Recordings , 2011 .

[15]  Hakan Erdogan,et al.  Discriminative nonnegative dictionary learning using cross-coherence penalties for single channel source separation , 2013, INTERSPEECH.

[16]  Hakan Erdogan,et al.  Regularized nonnegative matrix factorization using Gaussian mixture priors for supervised single channel source separation , 2013, Comput. Speech Lang..