Computing a nonnegative matrix factorization -- provably

The Nonnegative Matrix Factorization (NMF) problem has a rich history spanning quantum mechanics, probability theory, data analysis, polyhedral combinatorics, communication complexity, demography, chemometrics, etc. In the past decade NMF has become enormously popular in machine learning, where the factorization is computed using a variety of local search heuristics. Vavasis recently proved that this problem is NP-complete. We initiate a study of when this problem is solvable in polynomial time. Consider a nonnegative m x n matrix $M$ and a target inner-dimension r. Our results are the following: - We give a polynomial-time algorithm for exact and approximate NMF for every constant r. Indeed NMF is most interesting in applications precisely when r is small. We complement this with a hardness result, that if exact NMF can be solved in time (nm)o(r), 3-SAT has a sub-exponential time algorithm. Hence, substantial improvements to the above algorithm are unlikely. - We give an algorithm that runs in time polynomial in n, m and r under the separablity condition identified by Donoho and Stodden in 2003. The algorithm may be practical since it is simple and noise tolerant (under benign assumptions). Separability is believed to hold in many practical settings. To the best of our knowledge, this last result is the first polynomial-time algorithm that provably works under a non-trivial condition on the input matrix and we believe that this will be an interesting and important direction for future work.

[1]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[2]  Ravi Kumar,et al.  Recommendation Systems , 2001 .

[3]  L. Henry,et al.  Schémas de nuptialité : déséquilibre des sexes et célibat , 1969 .

[4]  Mihai Patrascu,et al.  On the possibility of faster SAT algorithms , 2010, SODA '10.

[5]  Gene H. Golub,et al.  Matrix computations , 1983 .

[6]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[7]  E. Harding The number of partitions of a set of N points in k dimensions induced by hyperplanes , 1967, Proceedings of the Edinburgh Mathematical Society.

[8]  A. Seidenberg A NEW DECISION METHOD FOR ELEMENTARY ALGEBRA , 1954 .

[9]  Joel A. Tropp,et al.  Factoring nonnegative matrices with linear programs , 2012, NIPS.

[10]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[11]  Lenore Blum,et al.  Complexity and Real Computation , 1997, Springer New York.

[12]  Jiri Matousek,et al.  Lectures on discrete geometry , 2002, Graduate texts in mathematics.

[13]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[14]  Michael E. Saks,et al.  Communication Complexity and Combinatorial Lattice Theory , 1993, J. Comput. Syst. Sci..

[15]  Jon M. Kleinberg,et al.  Using mixture models for collaborative filtering , 2004, STOC '04.

[16]  Hans Raj Tiwary,et al.  Extended Formulations for Polygons , 2011, Discret. Comput. Geom..

[17]  N. Nisan Lower Bounds for Non-Commutative Computation (Extended Abstract) , 1991, STOC 1991.

[18]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[19]  G. Sacks A DECISION METHOD FOR ELEMENTARY ALGEBRA AND GEOMETRY , 2003 .

[20]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[21]  Dima Grigoriev,et al.  Solving Systems of Polynomial Inequalities in Subexponential Time , 1988, J. Symb. Comput..

[22]  Noga Alon,et al.  Separable Partitions , 1999, Discret. Appl. Math..

[23]  James Renegar,et al.  On the Computational Complexity and Geometry of the First-Order Theory of the Reals, Part I: Introduction. Preliminaries. The Geometry of Semi-Algebraic Sets. The Decision Problem for the Existential Theory of the Reals , 1992, J. Symb. Comput..

[24]  G. Buchsbaum,et al.  Color categories revealed by non-negative matrix factorization of Munsell color spectra , 2002, Vision Research.

[25]  Marie-Françoise Roy,et al.  On the combinatorial and algebraic complexity of Quanti erEliminationS , 1994 .

[26]  Vikas Sindhwani,et al.  Fast Conical Hull Algorithms for Near-separable Non-negative Matrix Factorization , 2012, ICML.

[27]  Sanjeev Arora,et al.  A Practical Algorithm for Topic Modeling with Provable Guarantees , 2012, ICML.

[28]  Joel E. Cohen,et al.  Nonnegative ranks, decompositions, and factorizations of nonnegative matrices , 1993 .

[29]  Victoria Stodden,et al.  When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? , 2003, NIPS.

[30]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[31]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[32]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[33]  J. Renegar,et al.  On the Computational Complexity and Geometry of the First-Order Theory of the Reals, Part I , 1989 .

[34]  Noam Nisan,et al.  Lower bounds for non-commutative computation , 1991, STOC '91.

[35]  Russell Impagliazzo,et al.  On the Complexity of k-SAT , 2001, J. Comput. Syst. Sci..

[36]  E. A. Sylvestre,et al.  Self Modeling Curve Resolution , 1971 .

[37]  Santosh S. Vempala,et al.  Latent Semantic Indexing , 2000, PODS 2000.

[38]  Uriel G. Rothblum,et al.  On the number of separable partitions , 2011, J. Comb. Optim..

[39]  Ankur Moitra An Almost Optimal Algorithm for Computing Nonnegative Rank , 2013, SODA.

[40]  Mihalis Yannakakis,et al.  Expressing combinatorial optimization problems by linear programs , 1991, STOC '88.

[41]  Alfred V. Aho,et al.  On notions of information transfer in VLSI circuits , 1983, STOC.

[42]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[43]  Timothy H. McNicholl Review of "Complexity and real computation" by Blum, Cucker, Shub, and Smale. Springer-Verlag. , 2001, SIGA.

[44]  Stephen A. Vavasis,et al.  On the Complexity of Nonnegative Matrix Factorization , 2007, SIAM J. Optim..

[45]  Sanjeev Arora,et al.  Learning Topic Models -- Going beyond SVD , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.