Analysis of a Collaborative Filter Based on Popularity Amongst Neighbors

In this paper, we analyze a collaborative filter that answers the simple question: What is popular amongst your “friends”? While this basic principle seems to be prevalent in many practical implementations, there does not appear to be much theoretical analysis of its performance. In this paper, we partly fill this gap. While recent works on this topic, such as the low-rank matrix completion literature, consider the probability of error in recovering the entire rating matrix, we consider probability of an error in an individual recommendation [bit error rate (BER)]. For a mathematical model introduced by Aditya et al. in 2009 and 2011, we identify three regimes of operation for our algorithm (named Popularity Amongst Friends) in the limit as the matrix size grows to infinity. In a regime characterized by large number of samples and small degrees of freedom (defined precisely for the model in the paper), the asymptotic BER is zero; in a regime characterized by large number of samples and large degrees of freedom, the asymptotic BER is bounded away from 0 and 1/2 (and is identified exactly except for a special case); and in a regime characterized by a small number of samples, the algorithm fails. We then compare these results with the performance of the optimal recommender. We also present numerical results for the MovieLens and Netflix datasets. We discuss the empirical performance in light of our theoretical results and compare with an approach based on low-rank matrix completion.

[1]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[2]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[3]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[4]  Bikash Kumar Dey,et al.  A channel coding perspective of recommendation systems , 2009, 2009 IEEE International Symposium on Information Theory.

[5]  H. Vincent Poor,et al.  An Introduction to Signal Detection and Estimation , 1994, Springer Texts in Electrical Engineering.

[6]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[7]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[8]  Alessandro Panconesi,et al.  Concentration of Measure for the Analysis of Randomized Algorithms , 2009 .

[9]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[10]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[11]  Vasek Chvátal,et al.  The tail of the hypergeometric distribution , 1979, Discret. Math..

[12]  Yehuda Koren,et al.  Improved Neighborhood-based Collaborative Filtering , 2007 .

[13]  James Bennett,et al.  The Netflix Prize , 2007 .

[14]  Francis R. Bach,et al.  Low-rank matrix factorization with attributes , 2006, ArXiv.

[15]  Yoram Bresler,et al.  Efficient and guaranteed rank minimization by atomic decomposition , 2009, 2009 IEEE International Symposium on Information Theory.

[16]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[18]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[19]  Bikash Kumar Dey,et al.  A Channel Coding Perspective of Collaborative Filtering , 2011, IEEE Transactions on Information Theory.

[20]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[21]  Inderjit S. Dhillon,et al.  Matrix Completion from Power-Law Distributed Samples , 2009, NIPS.

[22]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[23]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[24]  John Riedl,et al.  Recommender Systems for Large-scale E-Commerce : Scalable Neighborhood Formation Using Clustering , 2002 .

[25]  Robert Legenstein,et al.  Improved neighborhood-based algorithms for large-scale recommender systems , 2008, NETFLIX '08.

[26]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[27]  B. Harshbarger An Introduction to Probability Theory and its Applications, Volume I , 1958 .