RLCF: A collaborative filtering approach based on reinforcement learning with sequential ratings

AbstractWe present a novel approach for collaborative filtering, RLCF, that considers the dynamics of user ratings. RLCF is based on reinforcement learning applied to the sequence of ratings. First, we formalize the collaborative filtering problem as a Markov Decision Process. Then, we learn the connection between the temporal sequences of user ratings using Q-learning. Experiments demonstrate the feasibility of our approach and a tight relationship between the past and the current ratings. We also suggest an ensemble learning in RLCF and demonstrate its improved performance.

[1]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[2]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[3]  David Hsu,et al.  Exploration in Interactive Personalized Music Recommendation: A Reinforcement Learning Approach , 2013, TOMM.

[4]  R. Bellman A Markovian Decision Process , 1957 .

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Yehuda Koren,et al.  Improved Neighborhood-based Collaborative Filtering , 2007 .

[7]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[8]  Yan Wang,et al.  An Effective Collaborative Filtering Via Enhanced Similarity and Probability Interval Prediction , 2014, Intell. Autom. Soft Comput..

[9]  Yonggang Shu,et al.  Study on Directed Trust Graph Based Recommendation for E-commerce System , 2014, Int. J. Comput. Commun. Control.

[10]  Peter Stone,et al.  DJ-MC: A Reinforcement-Learning Agent for Music Playlist Recommendation , 2014, AAMAS.

[11]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[12]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[13]  G. W. Stewart,et al.  On the Early History of the Singular Value Decomposition , 1993, SIAM Rev..

[14]  L. Thurstone A law of comparative judgment. , 1994 .

[15]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[16]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[17]  Xin Xu,et al.  Reinforcement learning algorithms with function approximation: Recent advances and applications , 2014, Inf. Sci..

[18]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[19]  Gerhard Friedrich,et al.  Recommender Systems - An Introduction , 2010 .

[20]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[21]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[22]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[23]  Arkadiusz Paterek,et al.  Improving regularized singular value decomposition for collaborative filtering , 2007 .

[24]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[25]  Ke Wang,et al.  Latent Factor Transition for Dynamic Collaborative Filtering , 2014, SDM.

[26]  M. Puterman Chapter 8 Markov decision processes , 1990 .