Scalable Methods for Nonnegative Matrix Factorizations of Near-separable Tall-and-skinny Matrices

Numerous algorithms are used for nonnegative matrix factorization under the assumption that the matrix is nearly separable. In this paper, we show how to make these algorithms scalable for data matrices that have many more rows than columns, so-called "tall-and-skinny matrices." One key component to these improved methods is an orthogonal matrix transformation that preserves the separability of the NMF problem. Our final methods need to read the data matrix only once and are suitable for streaming, multi-core, and MapReduce architectures. We demonstrate the efficacy of these algorithms on terabyte-sized matrices from scientific computing and bioinformatics.

[1]  Antonio J. Plaza,et al.  An overview on hyperspectral unmixing: Geometrical, statistical, and sparse regression based approaches , 2011, 2011 IEEE International Geoscience and Remote Sensing Symposium.

[2]  Chris H. Q. Ding,et al.  Symmetric Nonnegative Matrix Factorization for Graph Clustering , 2012, SDM.

[3]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[4]  David F. Gleich,et al.  Model Reduction With MapReduce-enabled Tall and Skinny Singular Value Decomposition , 2013, SIAM J. Sci. Comput..

[5]  Per-Gunnar Martinsson,et al.  On the Compression of Low Rank Matrices , 2005, SIAM J. Sci. Comput..

[6]  Nicolas Gillis,et al.  Robust near-separable nonnegative matrix factorization using linear optimization , 2013, J. Mach. Learn. Res..

[7]  Yi Pan,et al.  Sparse nonnegative matrix factorization for protein sequence motif discovery , 2011, Expert Syst. Appl..

[8]  Tony F. Chan,et al.  An Improved Algorithm for Computing the Singular Value Decomposition , 1982, TOMS.

[9]  David F. Gleich,et al.  Tall and skinny QR factorizations in MapReduce architectures , 2011, MapReduce '11.

[10]  Andrzej Cichocki,et al.  Regularized Alternating Least Squares Algorithms for Non-negative Matrix/Tensor Factorization , 2007, ISNN.

[11]  J. Boardman Automating spectral unmixing of AVIRIS data using convex geometry concepts , 1993 .

[12]  M. C. U. Araújo,et al.  The successive projections algorithm for variable selection in spectroscopic multicomponent analysis , 2001 .

[13]  Stephen A. Vavasis,et al.  On the Complexity of Nonnegative Matrix Factorization , 2007, SIAM J. Optim..

[14]  Sanjeev Arora,et al.  Computing a nonnegative matrix factorization -- provably , 2011, STOC '12.

[15]  Inderjit S. Dhillon,et al.  Fast Projection‐Based Methods for the Least Squares Nonnegative Matrix Approximation Problem , 2008, Stat. Anal. Data Min..

[16]  Thomas Hérault,et al.  QR factorization of tall and skinny matrices in a grid computing environment , 2009, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[17]  Jordi Vitrià,et al.  Non-negative Matrix Factorization for Face Recognition , 2002, CCIA.

[18]  Joel A. Tropp,et al.  Factoring nonnegative matrices with linear programs , 2012, NIPS.

[19]  Vikas Sindhwani,et al.  Fast Conical Hull Algorithms for Near-separable Non-negative Matrix Factorization , 2012, ICML.

[20]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[21]  Moody T. Chu,et al.  Low-Dimensional Polytope Approximation and Its Applications to Nonnegative Matrix Factorization , 2008, SIAM J. Sci. Comput..

[22]  Nicolas Gillis,et al.  Fast and Robust Recursive Algorithmsfor Separable Nonnegative Matrix Factorization , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  James Demmel,et al.  Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures , 2013, 2013 IEEE International Conference on Big Data.

[24]  James Demmel,et al.  Communication-Avoiding QR Decomposition for GPUs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[25]  C. Tong,et al.  Non-negative matrix factorization for face recognition , 2007 .

[26]  Haesun Park,et al.  Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework , 2014, J. Glob. Optim..

[27]  Victoria Stodden,et al.  When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? , 2003, NIPS.

[28]  James Demmel,et al.  Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..

[29]  Sen Jia,et al.  Constrained Nonnegative Matrix Factorization for Hyperspectral Unmixing , 2009, IEEE Transactions on Geoscience and Remote Sensing.

[30]  Chao Liu,et al.  Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce , 2010, WWW '10.