Tuning pruning in sparse non-negative matrix factorization

Non-negative matrix factorization (NMF) has become a popular tool for exploratory analysis due to its part based easy interpretable representation. Sparseness is commonly invoked in NMF (SNMF) by regularizing by the l1 - norm both to alleviate the non-uniqueness of the NMF representation as well as promote sparse (i.e. part based) representations. While sparseness can prune excess components thereby potentially also establish the number of components it is an open problem what constitutes the adequate degree of sparseness, i.e. how to tune the pruning. In a hierarchical Bayesian framework SNMF corresponds to imposing an exponential prior while the regularization strength can be expressed in terms of the hyper-parameters of these priors. Thus, within the Bayesian modelling framework Automatic Relevance Determination (ARD) can learn these pruning strengths from data. We demonstrate on three benchmark NMF data how the proposed ARD framework can be used to tune the pruning thereby also estimate the NMF model order.

[1]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[2]  Morten Mørup,et al.  Decomposition methods for unsupervised learning , 2008 .

[3]  H. Kaiser The varimax criterion for analytic rotation in factor analysis , 1958 .

[4]  Yuan Qi,et al.  Predictive automatic relevance determination by expectation propagation , 2004, ICML.

[5]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[6]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[7]  J. Eggert,et al.  Sparse coding and NMF , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[8]  Andrzej Cichocki,et al.  Non-Negative Tensor Factorization using Alpha and Beta Divergences , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[10]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[11]  Inderjit S. Dhillon,et al.  Generalized Nonnegative Matrix Approximations with Bregman Divergences , 2005, NIPS.

[12]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[13]  Mark D. Plumbley,et al.  Theorems on Positive Data: On the Uniqueness of NMF , 2008, Comput. Intell. Neurosci..

[14]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[15]  Lars Kai Hansen,et al.  Algorithms for Sparse Nonnegative Tucker Decompositions , 2008, Neural Computation.

[16]  C. Févotte,et al.  Automatic Relevance Determination in Nonnegative Matrix Factorization , 2009 .

[17]  L. K. Hansen,et al.  Algorithms for Sparse Non-negative Tucker decompositions , 2008 .

[18]  L. K. Hansen,et al.  Automatic relevance determination for multi‐way models , 2009 .

[19]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[20]  Victoria Stodden,et al.  When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? , 2003, NIPS.

[21]  H. Akaike A new look at the statistical model identification , 1974 .

[22]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[23]  R. Bro,et al.  A fast non‐negativity‐constrained least squares algorithm , 1997 .

[24]  Lars Kai Hansen,et al.  ERPWAVELAB A toolbox for multi-channel analysis of time–frequency transformed event related potentials , 2007, Journal of Neuroscience Methods.

[25]  Inderjit S. Dhillon,et al.  Fast Projection‐Based Methods for the Least Squares Nonnegative Matrix Approximation Problem , 2008, Stat. Anal. Data Min..

[26]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[27]  Rekha Govil,et al.  Neural Networks in Signal Processing , 2000 .

[28]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[29]  Ole Winther,et al.  Bayesian Non-negative Matrix Factorization , 2009, ICA.

[30]  Zoubin Ghahramani,et al.  On the Convergence of Bound Optimization Algorithms , 2002, UAI.

[31]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[32]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.