Matrix Factorization and Matrix Concentration

Motivated by the constrained factorization problems of sparse principal components analysis (PCA) for gene expression modeling, low-rank matrix completion for recommender systems, and robust matrix factorization for video surveillance, this dissertation explores the modeling, methodology, and theory of matrix factorization. We begin by exposing the theoretical and empirical shortcomings of standard deflation techniques for sparse PCA and developing alternative methodology more suitable for deflation with sparse “pseudo-eigenvectors.” We then explicitly reformulate the sparse PCA optimization problem and derive a generalized deflation procedure that typically outperforms more standard techniques on real-world datasets. We next develop a fully Bayesian matrix completion framework for integrating the complementary approaches of discrete mixed membership modeling and continuous matrix factorization. We introduce two Mixed Membership Matrix Factorization (M3F) models, develop highly parallelizable Gibbs sampling inference procedures, and find that M3F is both more parsimonious and more accurate than state-of-the-art baselines on real-world collaborative filtering datasets. Our third contribution is Divide-Factor-Combine (DFC), a parallel divide-and-conquer framework for boosting the scalability of a matrix completion or robust matrix factorization algorithm while retaining its theoretical guarantees. Our experiments demonstrate the near-linear to super-linear speed-ups attainable with this approach, and our analysis shows that DFC enjoys high-probability recovery guarantees comparable to those of its base algorithm. Finally, inspired by the analyses of matrix completion and randomized factorization procedures, we show how Stein’s method of exchangeable pairs can be used to derive concentration inequalities for matrix-valued random elements. As an immediate consequence, we obtain analogues of classical moment inequalities and exponential tail inequalities for independent and dependent sums of random matrices. We moreover derive comparable concentration inequalities for self-bounding matrix functions of dependent random elements.

[1]  A E Bostwick,et al.  THE THEORY OF PROBABILITIES. , 1896, Science.

[2]  A. Khintchine Über dyadische Brüche , 1923 .

[3]  W. Hoeffding A Combinatorial Central Limit Theorem , 1951 .

[4]  P. White The Computation of Eigenvalues and Eigenvectors of a Matrix , 1958 .

[5]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[6]  J. N. R. Jeffers,et al.  Two Case Studies in the Application of Principal Component Analysis , 1967 .

[7]  C. Stein A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[8]  D. Burkholder Distribution Function Inequalities for Martingales , 1973 .

[9]  E. Lieb Convex trace functions and the Wigner-Yanase-Dyson conjecture , 1973 .

[10]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[12]  Y. Saad Projection and deflation method for partial pole assignment in linear state feedback , 1988 .

[13]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[14]  D. Petz A survey of certain trace inequalities , 1994 .

[15]  Jorge Cadima Departamento de Matematica Loading and correlations in the interpretation of principle compenents , 1995 .

[16]  I. Jolliffe Rotation of principal components: choice of normalization constraints , 1995 .

[17]  G. Pisier,et al.  Non-Commutative Martingale Inequalities , 1997, math/9704209.

[18]  S. Goreinov,et al.  A Theory of Pseudoskeleton Approximations , 1997 .

[19]  Michael I. Jordan,et al.  Unsupervised Learning from Dyadic Data , 1998 .

[20]  Thomas Hofmann,et al.  Learning from Dyadic Data , 1998, NIPS.

[21]  Alan M. Frieze,et al.  Fast Monte-Carlo algorithms for finding low-rank approximations , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[22]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[23]  Christopher K. I. Williams,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[24]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[25]  Hongyuan Zha,et al.  Low-Rank Approximations with Sparse Factors I: Basic Algorithms and Error Analysis , 2001, SIAM J. Matrix Anal. Appl..

[26]  Rudolf Ahlswede,et al.  Strong converse for identification via quantum channels , 2000, IEEE Trans. Inf. Theory.

[27]  V. Paulsen Completely Bounded Maps and Operator Algebras: Contents , 2003 .

[28]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[29]  Benjamin M. Marlin,et al.  Modeling User Rating Profiles For Collaborative Filtering , 2003, NIPS.

[30]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[31]  Martin Mathieu COMPLETELY BOUNDED MAPS AND OPERATOR ALGEBRAS (Cambridge Studies in Advanced Mathematics 78) , 2004 .

[32]  Benjamin M. Marlin,et al.  Collaborative Filtering: A Machine Learning Perspective , 2004 .

[33]  Hongyuan Zha,et al.  Low-Rank Approximations with Sparse Factors II: Penalized Methods with Discrete Newton-Like Iterations , 2004, SIAM J. Matrix Anal. Appl..

[34]  V. Paulsen,et al.  COMPLETELY BOUNDED MAPS AND OPERATOR ALGEBRAS (Cambridge Studies in Advanced Mathematics 78) , 2004 .

[35]  Shai Avidan,et al.  Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms , 2005, NIPS.

[36]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[37]  Fuzhen Zhang The Schur complement and its applications , 2005 .

[38]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[39]  Shai Avidan,et al.  Generalized spectral bounds for sparse LDA , 2006, ICML.

[40]  Dennis DeCoste,et al.  Collaborative prediction using ensembles of Maximum Margin Matrix Factorizations , 2006, ICML.

[41]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[42]  Gert R. G. Lanckriet,et al.  Finding Musically Meaningful Words by Sparse CCA , 2007 .

[43]  Alexandre d'Aspremont,et al.  Full regularization path for sparse principal component analysis , 2007, ICML '07.

[44]  Gert R. G. Lanckriet,et al.  Sparse eigen methods by D.C. programming , 2007, ICML '07.

[45]  Arkadi Nemirovski,et al.  Sums of random symmetric matrices and quadratic optimization under orthogonality constraints , 2007, Math. Program..

[46]  David M. Pennock,et al.  Applying collaborative filtering techniques to movie search for better ranking and browsing , 2007, KDD '07.

[47]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[48]  Richard H. Liang Stein ’ s method for concentration inequalities , 2007 .

[49]  Charless C. Fowlkes,et al.  A Quantitative Spatiotemporal Atlas of Gene Expression in the Drosophila Blastoderm , 2008, Cell.

[50]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[51]  S. Muthukrishnan,et al.  Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..

[52]  Lester W. Mackey,et al.  Deflation Methods for Sparse PCA , 2008, NIPS.

[53]  Max Welling,et al.  Multi-HDP: A Non Parametric Bayesian Model for Tensor Factorization , 2008, AAAI.

[54]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[55]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[56]  A. Willsky,et al.  Sparse and low-rank matrix decompositions , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[57]  Domonkos Tikk,et al.  Scalable Collaborative Filtering Approaches for Large Recommender Systems , 2009, J. Mach. Learn. Res..

[58]  S. Yun,et al.  An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems , 2009 .

[59]  S. Yun,et al.  An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems , 2009 .

[60]  Andrea Montanari,et al.  Matrix Completion from Noisy Entries , 2009, J. Mach. Learn. Res..

[61]  Ameet Talwalkar,et al.  Ensemble Nystrom Method , 2009, NIPS.

[62]  Ameet Talwalkar,et al.  On sampling-based approximate spectral decomposition , 2009, ICML '09.

[63]  Arvind Ganesh,et al.  Fast Convex Optimization Algorithms for Exact Recovery of a Corrupted Low-Rank Matrix , 2009 .

[64]  Neil D. Lawrence,et al.  Non-linear matrix factorization with Gaussian processes , 2009, ICML '09.

[65]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[66]  John Wright,et al.  RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[67]  R. Oliveira Sums of random Hermitian matrices and an inequality by Rudelson , 2010, 1004.3821.

[68]  John Wright,et al.  Decomposing background topics from keywords by principal component pursuit , 2010, CIKM.

[69]  Xiaodong Li,et al.  Stable Principal Component Pursuit , 2010, 2010 IEEE International Symposium on Information Theory.

[70]  Vincent Nesme,et al.  Note on sampling without replacing from a finite collection of matrices , 2010, ArXiv.

[71]  Emmanuel J. Candès,et al.  Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[72]  Michael I. Jordan,et al.  Mixed Membership Matrix Factorization , 2010, ICML.

[73]  Sham M. Kakade,et al.  Dimension-free tail inequalities for sums of random matrices , 2011, ArXiv.

[74]  Constantine Caramanis,et al.  Robust Matrix Completion and Corrupted Columns , 2011, ICML.

[75]  Ameet Talwalkar,et al.  Can matrix coherence be efficiently and accurately estimated? , 2011, AISTATS.

[76]  Shuen Cheung,et al.  Chance – Constrained Linear Matrix Inequalities with Dependent Perturbations : A Safe Tractable Approximation Approach ∗ Sin – , 2011 .

[77]  Martin J. Wainwright,et al.  Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions , 2011, ICML.

[78]  Jian Dong,et al.  Accelerated low-rank visual recovery by random projection , 2011, CVPR 2011.

[79]  Shiqian Ma,et al.  Fixed point and Bregman iterative methods for matrix rank minimization , 2009, Math. Program..

[80]  Ameet Talwalkar,et al.  Divide-and-Conquer Matrix Factorization , 2011, NIPS.

[81]  David Gross,et al.  Recovering Low-Rank Matrices From Few Coefficients in Any Basis , 2009, IEEE Transactions on Information Theory.

[82]  Anthony Man-Cho So,et al.  Moment inequalities for sums of random matrices and their applications in optimization , 2011, Math. Program..

[83]  Alex Gittens,et al.  TAIL BOUNDS FOR ALL EIGENVALUES OF A SUM OF RANDOM MATRICES , 2011, 1104.4513.

[84]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[85]  Benjamin Recht,et al.  A Simpler Approach to Matrix Completion , 2009, J. Mach. Learn. Res..

[86]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[87]  Martin J. Wainwright,et al.  Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[88]  Michael I. Jordan,et al.  Matrix concentration inequalities via the method of exchangeable pairs , 2012, 1201.6002.