Real-time top-n recommendation in social streams

The Social Web is successfully established, and steadily growing in terms of users, content and services. People generate and consume data in real-time within social networking services, such as Twitter, and increasingly rely upon continuous streams of messages for real-time access to fresh knowledge about current affairs. In this paper, we focus on analyzing social streams in real-time for personalized topic recommendation and discovery. We consider collaborative filtering as an online ranking problem and present Stream Ranking Matrix Factorization - RMFX -, which uses a pairwise approach to matrix factorization in order to optimize the personalized ranking of topics. Our novel approach follows a selective sampling strategy to perform online model updates based on active learning principles, that closely simulates the task of identifying relevant items from a pool of mostly uninteresting ones. RMFX is particularly suitable for large scale applications and experiments on the "476 million Twitter tweets" dataset show that our online approach largely outperforms recommendations based on Twitter's global trend, and it is also able to deliver highly competitive Top-N recommendations faster while using less space than Weighted Regularized Matrix Factorization (WRMF), a state-of-the-art matrix factorization technique for Collaborative Filtering, demonstrating the efficacy of our approach.

[1]  Deepak Agarwal,et al.  Fast online learning through offline initialization for time-sensitive recommendation , 2010, KDD.

[2]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[3]  D. Sculley,et al.  Combined regression and ranking , 2010, KDD.

[4]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[5]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[6]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[7]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[8]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[9]  C. Lee Giles,et al.  Learning on the border: active learning in imbalanced data classification , 2007, CIKM '07.

[10]  Lars Schmidt-Thieme,et al.  MyMediaLite: a free recommender system library , 2011, RecSys '11.

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[13]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[14]  Lars Schmidt-Thieme,et al.  Online-updating regularized kernel matrix factorization models for large-scale recommender systems , 2008, RecSys '08.

[15]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[16]  Rong Jin,et al.  Online AUC Maximization , 2011, ICML.

[17]  George Karypis,et al.  Item-based top-N recommendation algorithms , 2004, TOIS.

[18]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[19]  Roberto Turrin,et al.  Performance of recommender algorithms on top-n recommendation tasks , 2010, RecSys '10.

[20]  Hwanjo Yu,et al.  SVM selective sampling for ranking with application to data retrieval , 2005, KDD '05.

[21]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[22]  Lars Schmidt-Thieme,et al.  Towards Optimal Active Learning for Matrix Factorization in Recommender Systems , 2011, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence.