Partially Observable Markov Decision Process for Recommender Systems

We report the "Recurrent Deterioration" (RD) phenomenon observed in online recommender systems. The RD phenomenon is reflected by the trend of performance degradation when the recommendation model is always trained based on users' feedbacks of the previous recommendations. There are several reasons for the recommender systems to encounter the RD phenomenon, including the lack of negative training data and the evolution of users' interests, etc. Motivated to tackle the problems causing the RD phenomenon, we propose the POMDP-Rec framework, which is a neural-optimized Partially Observable Markov Decision Process algorithm for recommender systems. We show that the POMDP-Rec framework effectively uses the accumulated historical data from real-world recommender systems and automatically achieves comparable results with those models fine-tuned exhaustively by domain exports on public datasets.

[1]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[2]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[3]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[4]  Guy Shani,et al.  An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..

[5]  Raymond J. Mooney,et al.  Content-boosted collaborative filtering for improved recommendations , 2002, AAAI/IAAI.

[6]  Qiang Yang,et al.  One-Class Collaborative Filtering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[7]  Shou-De Lin,et al.  A Linear Ensemble of Individual and Blended Models for Music Rating Prediction , 2012, KDD Cup.

[8]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[9]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[10]  Yisong Yue,et al.  Hierarchical Exploration for Accelerating Contextual Bandits , 2012, ICML.

[11]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[12]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Phillipp Bergmann Dynamic Programming Deterministic And Stochastic Models , 2016 .

[15]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[16]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[17]  Zoubin Ghahramani,et al.  Probabilistic Matrix Factorization with Non-random Missing Data , 2014, ICML.

[18]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[19]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[20]  Yehuda Koren,et al.  Collaborative filtering with temporal dynamics , 2009, KDD.

[21]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[22]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[23]  Param Vir Singh,et al.  A Hidden Markov Model for Collaborative Filtering , 2010, MIS Q..

[24]  Peter I. Frazier,et al.  Exploration vs. Exploitation in the Information Filtering Problem , 2014, ArXiv.

[25]  James Bennett,et al.  The Netflix Prize , 2007 .

[26]  Yehuda Koren,et al.  The Yahoo! Music Dataset and KDD-Cup '11 , 2012, KDD Cup.

[27]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[28]  Richard S. Zemel,et al.  Collaborative prediction and ranking with non-random missing data , 2009, RecSys '09.

[29]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[30]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[31]  Chih-Jen Lin,et al.  A fast parallel SGD for matrix factorization in shared memory systems , 2013, RecSys.