Efficient algorithms for collaborative filtering

Collaborative filtering is a novel statistical technique to obtain useful information or to make predictions based on data from multiple agents. A large number of such datasets are naturally represented in matrix form. Typically, there exists a matrix M from which we know a (typically sparse) subset of entries M_ij for (i, j) in some set E. The problem then is to predict/approximate the unseen entries. This framework of matrix completion is extremely general and applications include personalized recommendation systems, sensor positioning, link prediction and so on. Low rank models have traditionally been used to learn useful information from such datasets. Low-dimensional representations simplify the description of the dataset and often yield predictive powers. As an added benefit, it is easier to store and retrieve low dimensional representations. Finally, many computationally intensive operations such as matrix multiplication and inversion are simplified with low dimensional representations. Singular Value Decomposition (SVD) has traditionally been used to find the lowdimensional representation of a fully revealed matrix. There are numerous algorithms for computing the SVD of a matrix including several parallel implementations and implementations for sparse matrices. However, when the matrix is only partially observed, we show that SVD techniques are sub-optimal. In this work, we will develop algorithms to learn a low rank model from a partially revealed matrix. These algorithms are computationally efficient and highly parallelizable. We will show that the proposed algorithms achieve a performance close to the fundamental limit in a number of scenarios. Finally, the algorithms achieve significantly better performance than the state-of-the-art algorithms on many real collaborative filtering datasets.