Algorithms for Non-negative Matrix Factorization

Non-negative matrix factorization (NMF) has previously been shown to be a useful decomposition for multivariate data. Two different multiplicative algorithms for NMF are analyzed. They differ only slightly in the multiplicative factor used in the update rules. One algorithm can be shown to minimize the conventional least squares error while the other minimizes the generalized Kullback-Leibler divergence. The monotonic convergence of both algorithms can be proven using an auxiliary function analogous to that used for proving convergence of the Expectation-Maximization algorithm. The algorithms can also be interpreted as diagonally rescaled gradient descent, where the rescaling factor is optimally chosen to ensure convergence.

[1]  William H. Richardson,et al.  Bayesian-Based Iterative Method of Image Restoration , 1972 .

[2]  L. Lucy An iterative technique for the rectification of observed distributions , 1974 .

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  L. Shepp,et al.  Maximum Likelihood Reconstruction for Emission Tomography , 1983, IEEE Transactions on Medical Imaging.

[5]  W. Press,et al.  Numerical Recipes: The Art of Scientific Computing , 1987 .

[6]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[7]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[8]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[9]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[10]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[11]  H. Sebastian Seung,et al.  Unsupervised Learning by Convex and Conic Coding , 1996, NIPS.

[12]  Ken D. Sauer,et al.  A unified approach to statistical tomography using coordinate descent optimization , 1996, IEEE Trans. Image Process..

[13]  P. Paatero Least squares formulation of robust non-negative factor analysis , 1997 .

[14]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[15]  Fernando Pereira,et al.  Aggregate and mixed-order Markov models for statistical language processing , 1997, EMNLP.

[16]  Peter Földiák,et al.  SPARSE CODING IN THE PRIMATE CORTEX , 2002 .

[17]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.