Automatic Relevance Determination in Nonnegative Matrix Factorization with the-Divergence

This paper addresses the estimation of the latent dimensionality in nonnegative matrix factorization (NMF) with the -divergence. The -divergence is a family of cost functions that includes the squared euclidean distance, Kullback-Leibler (KL) and Itakura-Saito (IS) divergences as special cases. Learning the model order is important as it is necessary to strike the right balance between data fidelity and overfitting. We propose a Bayesian model based on automatic relevance determination (ARD) in which the columns of the dictionary matrix and the rows of the activation matrix are tied together through a common scale parameter in their prior. A family of majorization-minimization (MM) algorithms is proposed for maximum a posteriori (MAP) estimation. A subset of scale parameters is driven to a small lower bound in the course of inference, with the effect of pruning the corresponding spurious components. We demonstrate the efficacy and robustness of our algorithms by performing extensive experiments on synthetic data, the swimmer dataset, a music decomposition example, and a stock price prediction task.

[1]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[2]  B. Jørgensen Exponential Dispersion Models , 1987 .

[3]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[4]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[5]  M. C. Jones,et al.  Robust and efficient estimation by minimising a density power divergence , 1998 .

[6]  Christopher M. Bishop,et al.  Bayesian PCA , 1998, NIPS.

[7]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[8]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[9]  Jordi Vitrià,et al.  Analyzing non-negative matrix factorization for image classification , 2002, Object recognition supported by user interaction for service robots.

[10]  Victoria Stodden,et al.  When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? , 2003, NIPS.

[11]  J. Eggert,et al.  Sparse coding and NMF , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[12]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[13]  Yuan Gao,et al.  Improving molecular cancer class discovery through sparse non-negative matrix factorization , 2005 .

[14]  Andrzej Cichocki,et al.  Csiszár's Divergences for Non-negative Matrix Factorization: Family of New Algorithms , 2006, ICA.

[15]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[16]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[17]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[18]  Konstantinos Drakakis,et al.  Analysis of Financial Data Using , 2008 .

[19]  M. Mørup,et al.  Sparse Coding and Automatic Relevance Determination for Multi-way models , 2009 .

[20]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[21]  Lars Kai Hansen,et al.  Tuning pruning in sparse non-negative matrix factorization , 2009, 2009 17th European Signal Processing Conference.

[22]  Mingjun Zhong,et al.  Reversible Jump MCMC for Non-Negative Matrix Factorization , 2009, AISTATS.

[23]  Ali Taylan Cemgil,et al.  Bayesian Inference for Nonnegative Matrix Factorisation Models , 2009, Comput. Intell. Neurosci..

[24]  C. Févotte,et al.  Automatic Relevance Determination in Nonnegative Matrix Factorization , 2009 .

[25]  Ole Winther,et al.  Bayesian Non-negative Matrix Factorization , 2009, ICA.

[26]  Upendra Prasad Nonnegative matrix factorization: Analysis, algorithms and applications , 2009 .

[27]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[28]  Erkki Oja,et al.  Linear and Nonlinear Projective Nonnegative Matrix Factorization , 2010, IEEE Transactions on Neural Networks.

[29]  Perry R. Cook,et al.  Bayesian Nonparametric Matrix Factorization for Recorded Music , 2010, ICML.

[30]  Morten Mørup,et al.  Infinite non-negative matrix factorization , 2010, 2010 18th European Signal Processing Conference.

[31]  H. Kameoka,et al.  Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with β-divergence , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[32]  Erkki Oja,et al.  Automatic Rank Determination in Projective Nonnegative Matrix Factorization , 2010, LVA/ICA.

[33]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[34]  Jérôme Idier,et al.  Algorithms for nonnegative matrix factorization with the beta-divergence , 2010, ArXiv.

[35]  Francis Bach,et al.  Online algorithms for nonnegative matrix factorization with the Itakura-Saito divergence , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[36]  Sergio Cruces,et al.  Generalized Alpha-Beta Divergences and Their Application to Robust Nonnegative Matrix Factorization , 2011, Entropy.

[37]  Erkki Oja,et al.  Unified Development of Multiplicative Algorithms for Linear and Quadratic Nonnegative Matrix Factorization , 2011, IEEE Transactions on Neural Networks.

[38]  Bhaskar D. Rao,et al.  Latent Variable Bayesian Models for Promoting Sparsity , 2011, IEEE Transactions on Information Theory.

[39]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..