Accurate and scalable social recommendation using mixed-membership stochastic block models

Significance Recommendation systems are designed to predict users’ preferences and provide them with recommendations for items such as books or movies that suit their needs. Recent developments show that some probabilistic models for user preferences yield better predictions than latent feature models such as matrix factorization. However, it has not been possible to use them in real-world datasets because they are not computationally efficient. We have developed a rigorous probabilistic model that outperforms leading approaches for recommendation and whose parameters can be fitted efficiently with an algorithm whose running time scales linearly with the size of the dataset. This model and inference algorithm open the door to more approaches to recommendation and to other problems where matrix factorization is currently used. With increasing amounts of information available, modeling and predicting user preferences—for books or articles, for example—are becoming more important. We present a collaborative filtering model, with an associated scalable algorithm, that makes accurate predictions of users’ ratings. Like previous approaches, we assume that there are groups of users and of items and that the rating a user gives an item is determined by their respective group memberships. However, we allow each user and each item to belong simultaneously to mixtures of different groups and, unlike many popular approaches such as matrix factorization, we do not assume that users in each group prefer a single group of items. In particular, we do not assume that ratings depend linearly on a measure of similarity, but allow probability distributions of ratings to depend freely on the user’s and item’s groups. The resulting overlapping groups and predicted ratings can be inferred with an expectation-maximization algorithm whose running time scales linearly with the number of observed ratings. Our approach enables us to predict user preferences in large datasets and is considerably more accurate than the current algorithms for such large datasets.

[1]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[2]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Roger Guimerà,et al.  Predicting Human Preferences Using the Block Structure of Complex Social Networks , 2012, PloS one.

[4]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[5]  O. Bagasra,et al.  Proceedings of the National Academy of Sciences , 1914, Science.

[6]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[7]  Wei Chu,et al.  Information Services]: Web-based services , 2022 .

[8]  John Riedl,et al.  Rethinking the recommender research ecosystem: reproducibility, openness, and LensKit , 2011, RecSys '11.

[9]  Chengqi Zhang,et al.  Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 2015, KDD.

[10]  Zoubin Ghahramani,et al.  Proceedings of the 24th international conference on Machine learning , 2007, ICML 2007.

[11]  W. Bruce Croft,et al.  Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval , 2011, SIGIR.

[12]  Thomas G. Dietterich,et al.  In Advances in Neural Information Processing Systems 12 , 1991, NIPS 1991.

[13]  Lars Schmidt-Thieme,et al.  Proceedings of the third ACM conference on Recommender systems , 2008, RecSys 2008.

[14]  H. J. Mclaughlin,et al.  Learn , 2002 .

[15]  ScienceDirect Computational statistics & data analysis , 1983 .

[16]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[17]  Susan T. Dumais,et al.  Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval , 2004, SIGIR 2004.

[18]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[19]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[20]  David M. Blei,et al.  Scalable Recommendation with Poisson Factorization , 2013, ArXiv.

[21]  Ido Guy,et al.  Proceedings of the 16th ACM Conference on Recommender Systems , 2012, RecSys 2012.

[22]  Haesun Park,et al.  Sparse Nonnegative Matrix Factorization for Clustering , 2008 .

[23]  Tiago P. Peixoto Model selection and hypothesis testing for large-scale network models with overlapping groups , 2014, ArXiv.

[24]  宁北芳,et al.  疟原虫var基因转换速率变化导致抗原变异[英]/Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A , 2005 .

[25]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[26]  Zoubin Ghahramani,et al.  Modeling Dyadic Data with Binary Latent Factors , 2006, NIPS.

[27]  Jure Leskovec,et al.  Inferring Networks of Substitutable and Complementary Products , 2015, KDD.

[28]  原田 秀逸 私の computer 環境 , 1998 .

[29]  Arkadiusz Paterek,et al.  Improving regularized singular value decomposition for collaborative filtering , 2007 .

[30]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[31]  Mark E. J. Newman,et al.  Structure and inference in annotated networks , 2015, Nature Communications.

[32]  Arindam Banerjee,et al.  Generalized Probabilistic Matrix Factorizations for Collaborative Filtering , 2010, 2010 IEEE International Conference on Data Mining.

[33]  J. Herskowitz,et al.  Proceedings of the National Academy of Sciences, USA , 1996, Current Biology.

[34]  Roger Guimerà,et al.  Missing and spurious interactions and the reconstruction of complex networks , 2009, Proceedings of the National Academy of Sciences.

[35]  A. Raftery,et al.  Probabilistic forecasts, calibration and sharpness , 2007 .

[36]  廣瀬雄一,et al.  Neuroscience , 2019, Workplace Attachments.

[37]  W. Gardner Learning characteristics of stochastic-gradient-descent algorithms: A general study, analysis, and critique , 1984 .

[38]  Mary Ellen Zurko,et al.  Proceedings of the 10th international conference on World Wide Web , 2001, WWW 2001.

[39]  Michael W. Berry,et al.  Algorithms and applications for approximate nonnegative matrix factorization , 2007, Comput. Stat. Data Anal..

[40]  Mark E. J. Newman,et al.  An efficient and principled method for detecting communities in networks , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[42]  Ali Taylan Cemgil,et al.  Bayesian Inference for Nonnegative Matrix Factorisation Models , 2009, Comput. Intell. Neurosci..

[43]  Vaclav Petricek,et al.  Recommender System for Online Dating Service , 2007, ArXiv.

[44]  David M. Pennock,et al.  Categories and Subject Descriptors , 2001 .

[45]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[46]  Stefan Schaal,et al.  Proc. Advances in Neural Information Processing Systems (NIPS '08) , 2008 .

[47]  Michael I. Jordan,et al.  Mixed Membership Matrix Factorization , 2010, ICML.