论文信息 - Incremental Basis Estimation Adopting Global k-means Algorithm for NMF-Based Audio Source Separation

Incremental Basis Estimation Adopting Global k-means Algorithm for NMF-Based Audio Source Separation

Nonnegative matrix factorization (NMF) is a data decomposition technique enabling to discover meaningful latent nonnegative components. Since, however, the objective function of NMF is non-convex, the performance of the source separation can degrade when the iterative update of the basis matrix in the training procedure is stuck to a poor local minimum. In most of the previous studies, the whole basis matrix for a specific source is iteratively updated to minimize a certain objective function with random initialization although a few approaches have been proposed for the systematic initialization of the basis matrix such as the singular value decomposition and k-means clustering. In this paper, we propose an approach to robust bases estimation in which an incremental strategy is adopted. Based on an analogy between clustering and NMF analysis, we estimate the NMF bases in a similar way to the global k-means algorithm popular in the data clustering area. Experiments on audio source separation showed that the proposed methods outperformed the conventional NMF technique using random initialization by about 1.93 dB and 2.34 dB in signal-to-distortion ratio when the target source was speech and violin, respectively.

Jong Won Shin | N. Kim | Kisoo Kwon

[1] Paul S. Bradley,et al. Refining Initial Points for K-Means Clustering , 1998, ICML.

[2] Stefan M. Wild. Seeding Non-Negative Matrix Factorizations with the Spherical K-Means Clustering , 2003 .

[3] Nikos A. Vlassis,et al. The global k-means clustering algorithm , 2003, Pattern Recognit..

[4] Victoria Stodden,et al. When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? , 2003, NIPS.

[5] J. Eggert,et al. Sparse coding and NMF , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[6] Stefan M. Wild,et al. Improving non-negative matrix factorizations through structured initialization , 2004, Pattern Recognit..

[7] Patrik O. Hoyer,et al. Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[8] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9] Barak A. Pearlmutter,et al. Convolutive Non-Negative Matrix Factorisation with a Sparseness Constraint , 2006 .

[10] Michael W. Berry,et al. Document clustering using nonnegative matrix factorization , 2006, Inf. Process. Manag..

[11] Paris Smaragdis,et al. Convolutive Speech Bases and Their Application to Supervised Speech Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[12] Jie Yang,et al. Initialization enhancer for non-negative matrix factorization , 2007, Eng. Appl. Artif. Intell..

[13] Bhiksha Raj,et al. Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.

[14] J. Larsen,et al. Wind Noise Reduction using Non-Negative Sparse Coding , 2007, 2007 IEEE Workshop on Machine Learning for Signal Processing.

[15] Chih-Jen Lin,et al. Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[16] Tuomas Virtanen,et al. Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17] Christos Boutsidis,et al. SVD based initialization: A head start for nonnegative matrix factorization , 2008, Pattern Recognit..

[18] Bhiksha Raj,et al. Regularized non-negative matrix factorization with temporal dependencies for speech denoising , 2008, INTERSPEECH.

[19] Xinhe Xu,et al. Facial expression recognition based on PCA and NMF , 2008, 2008 7th World Congress on Intelligent Control and Automation.

[20] Geoffrey J. Gordon,et al. A Unified View of Matrix Factorization Models , 2008, ECML/PKDD.

[21] Nancy Bertin,et al. Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[22] Guillermo Sapiro,et al. Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[23] Arne Leijon,et al. A new linear MMSE filter for single channel speech enhancement based on Nonnegative Matrix Factorization , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[24] Zhigang Luo,et al. Online Nonnegative Matrix Factorization With Robust Stochastic Approximation , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[25] Paris Smaragdis,et al. Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[26] Asoke K. Nandi,et al. An enhanced initialization method for non-negative matrix factorization , 2013, 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[27] Patricio A. Vela,et al. A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm , 2012, Expert Syst. Appl..

[28] Paris Smaragdis,et al. Static and Dynamic Source Separation Using Nonnegative Factorizations: A unified view , 2014, IEEE Signal Processing Magazine.

[29] Christian Bauckhage,et al. A Purely Geometric Approach to Non-Negative Matrix Factorization , 2014, LWA.

[30] Jonathan Le Roux,et al. Discriminative NMF and its application to single-channel source separation , 2014, INTERSPEECH.

[31] P. Smaragdis,et al. Compositional models for audio processing , 2014 .

[32] Amy Nicole Langville,et al. Algorithms, Initializations, and Convergence for the Nonnegative Matrix Factorization , 2014, ArXiv.

[33] Le Roux. Sparse NMF – half-baked or well done? , 2015 .

[34] Nam Soo Kim,et al. NMF-Based Speech Enhancement Using Bases Update , 2015, IEEE Signal Processing Letters.

[35] Nam Soo Kim,et al. Discriminative nonnegative matrix factorization using cross-reconstruction error for source separation , 2015, INTERSPEECH.

[36] Nam Soo Kim,et al. Incremental approach to NMF basis estimation for audio source separation , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[37] Hanwook Chung,et al. Discriminative Training of NMF Model Based on Class Probabilities for Speech Enhancement , 2016, IEEE Signal Processing Letters.