Matrix Completion from Power-Law Distributed Samples

The low-rank matrix completion problem is a fundamental problem with many important applications. Recently, [4],[13] and [5] obtained the first non-trivial theoretical results for the problem assuming that the observed entries are sampled uniformly at random. Unfortunately, most real-world datasets do not satisfy this assumption, but instead exhibit power-law distributed samples. In this paper, we propose a graph theoretic approach to matrix completion that solves the problem for more realistic sampling models. Our method is simpler to analyze than previous methods with the analysis reducing to computing the threshold for complete cascades in random graphs, a problem of independent interest. By analyzing the graph theoretic problem, we show that our method achieves exact recovery when the observed entries are sampled from the Chung-Lu-Vu model, which can generate power-law distributed graphs. We also hypothesize that our algorithm solves the matrix completion problem from an optimal number of entries for the popular preferential attachment model and provide strong empirical evidence for the claim. Furthermore, our method is easy to implement and is substantially faster than existing methods. We demonstrate the effectiveness of our method on random instances where the low-rank matrix is sampled according to the prevalent random graph models for complex networks and present promising preliminary results on the Netflix challenge dataset.

[1]  Endre Szemerédi,et al.  On the second eigenvalue of random regular graphs , 1989, STOC '89.

[2]  Jon M Kleinberg,et al.  Hubs, authorities, and communities , 1999, CSUR.

[3]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[4]  Christos H. Papadimitriou,et al.  On the Eigenvalue Power Law , 2002, RANDOM.

[5]  Jon Kleinberg,et al.  Maximizing the spread of influence through a social network , 2003, KDD '03.

[6]  Matthew Brand,et al.  Fast Online SVD Revisions for Lightweight Recommender Systems , 2003, SDM.

[7]  Christos Gkantsidis,et al.  Spectral analysis of Internet topologies , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[8]  F. Chung,et al.  Spectra of random graphs with given expected degrees , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[9]  U. Feige,et al.  Spectral techniques applied to sparse random graphs , 2005 .

[10]  Éva Tardos,et al.  Influential Nodes in a Diffusion Model for Social Networks , 2005, ICALP.

[11]  Yehuda Koren,et al.  Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[12]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[13]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[14]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[15]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[16]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, ISIT.

[17]  Silvio Lattanzi,et al.  Affiliation networks , 2009, STOC '09.

[18]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[19]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[20]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.