Sparse NMF – half-baked or well done?

Non-negative matrix factorization (NMF) has been a popular method for modeling audio signals, in particular for single-channel source separation. An important factor in the success of NMF-based algorithms is the ”quality” of the basis functions that are obtained from training data. In order to model rich signals such as speech or wide ranges of non-stationary noises, NMF typically requires using a large number of basis functions. However, without additional constraints, using a large number of bases leads to trivial solutions where the bases can indiscriminately model any signal. Two main approaches have been considered to cope with this issue: introducing sparsity on the activation coefficients, or skipping training altogether and randomly selecting basis functions as a subset of the training data (”exemplarbased NMF”). Surprisingly, the sparsity route is widely regarded as leading to similar or worse results than the simple and extremely efficient (no training!) exemplar-based approach. Only a small fraction of researchers have realized that sparse NMF works well if implemented correctly. However, to our knowledge, no thorough comparison has been presented in the literature, and many researchers in the field may remain unaware of this fact. We review exemplarbased NMF as well as two versions of sparse NMF, a simplistic ad hoc one and a principled one, giving a detailed derivation of the update equations for the latter in the general case of beta divergences, and we perform a thorough comparison of the three methods on a speech separation task using the 2nd CHiME Speech Separation and Recognition Challenge dataset. Results show that, contrary to a popular belief in the community, learning basis functions using NMF with sparsity, if done the right way, leads to significant gains in source-to-distortion ratio with respect to both exemplar-based NMF and the ad hoc implementation of sparse NMF. Mitsubishi Electric Research Laboratories This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Copyright c © Mitsubishi Electric Research Laboratories, Inc., 2015 201 Broadway, Cambridge, Massachusetts 02139 Sparse NMF – half-baked or well done? Jonathan Le Roux Mitsubishi Electric Research Labs (MERL) Cambridge, MA, USA leroux@merl.com Felix Weninger TUM Munich, Germany felix@weninger.de John R. Hershey Mitsubishi Electric Research Labs (MERL) Cambridge, MA, USA hershey@merl.com

[1]  T. Kailath,et al.  Least squares type algorithm for adaptive implementation of Pisarenko's harmonic retrieval method , 1982 .

[2]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[3]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[4]  Sun-Yuan Kung,et al.  On gradient adaptation with unit-norm constraints , 2000, IEEE Trans. Signal Process..

[5]  Nanning Zheng,et al.  Non-negative matrix factorization for visual coding , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[6]  J. Eggert,et al.  Sparse coding and NMF , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[7]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Barak A. Pearlmutter,et al.  Convolutive Non-Negative Matrix Factorisation with a Sparseness Constraint , 2006, 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing.

[9]  Mikkel N. Schmidt,et al.  Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[10]  Paris Smaragdis,et al.  Convolutive Speech Bases and Their Application to Supervised Speech Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Mikkel N. Schmidt,et al.  Shift Invariant Sparse Coding of Image and Music Data , 2007 .

[12]  Bhiksha Raj,et al.  Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.

[13]  Hirokazu Kameoka,et al.  Complex NMF: A new sparse representation for acoustic signals , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[15]  Jonathan Le Roux Exploiting regularities in natural acoustical scenes for monaural audio signal estimation, decomposition, restoration and modification , 2009 .

[16]  Bhiksha Raj,et al.  A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds , 2009, NIPS.

[17]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[18]  Les E. Atlas,et al.  Single-channel source separation using simplified-training complex matrix factorization , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Bhiksha Raj,et al.  Non-negative matrix factorization based compensation of music for automatic speech recognition , 2010, INTERSPEECH.

[20]  Jérôme Idier,et al.  Algorithms for nonnegative matrix factorization with the beta-divergence , 2010, ArXiv.

[21]  Björn W. Schuller,et al.  Real-Time Speech Separation by Semi-supervised Nonnegative Matrix Factorization , 2012, LVA/ICA.

[22]  Björn Schuller,et al.  The TUM+TUT+KUL approach to the CHiME challenge 2013: Multi-stream ASR exploiting BLSTM networks and sparse NMF , 2013 .