Automatic Representation for Lifetime Value Recommender Systems

Many modern commercial sites employ recommender systems to propose relevant content to users. While most systems are focused on maximizing the immediate gain (clicks, purchases or ratings), a better notion of success would be the lifetime value (LTV) of the user-system interaction. The LTV approach considers the future implications of the item recommendation, and seeks to maximize the cumulative gain over time. The Reinforcement Learning (RL) framework is the standard formulation for optimizing cumulative successes over time. However, RL is rarely used in practice due to its associated representation, optimization and validation techniques which can be complex. In this paper we propose a new architecture for combining RL with recommendation systems which obviates the need for hand-tuned features, thus automating the state-space representation construction process. We analyze the practical difficulties in this formulation and test our solutions on batch off-line real-world recommendation data.

[1]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[2]  Martha Larson,et al.  CLiMF: learning to maximize reciprocal rank with collaborative less-is-more filtering , 2012, RecSys.

[3]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[4]  P. Pfeifer,et al.  Modeling customer relationships as Markov chains , 2000 .

[5]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[6]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[7]  B. Efron Better Bootstrap Confidence Intervals , 1987 .

[8]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[9]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[10]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[11]  Dirk Van den Poel,et al.  Joint optimization of customer segmentation and marketing policy to maximize long-term profitability , 2002, Expert Syst. Appl..

[12]  Philip S. Thomas,et al.  High-Confidence Off-Policy Evaluation , 2015, AAAI.

[13]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[14]  Naoki Abe,et al.  Sequential cost-sensitive decision making with reinforcement learning , 2002, KDD.

[15]  William W. Hager,et al.  Updating the Inverse of a Matrix , 1989, SIAM Rev..

[16]  Andrew G. Barto,et al.  Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.

[17]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[18]  Sophie Ahrens,et al.  Recommender Systems , 2012 .

[19]  AdomaviciusGediminas,et al.  Toward the Next Generation of Recommender Systems , 2005 .

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Long Tran-Thanh,et al.  Efficient Thompson Sampling for Online Matrix-Factorization Recommendation , 2015, NIPS.

[22]  Angshul Majumdar,et al.  SVD free matrix completion with online bias correction for Recommender systems , 2015, 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR).

[23]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[24]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[25]  Jean Dickinson Gibbons,et al.  Nonparametric Statistical Inference , 1972, International Encyclopedia of Statistical Science.

[26]  Ulrich Paquet,et al.  Beyond Collaborative Filtering: The List Recommendation Problem , 2016, WWW.

[27]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[28]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[29]  David M. Blei,et al.  Bayesian Nonparametric Poisson Factorization for Recommendation Systems , 2014, AISTATS.

[30]  Richard S. Sutton,et al.  Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.

[31]  Guy Shani,et al.  An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..

[32]  Lior Rokach,et al.  Introduction to Recommender Systems Handbook , 2011, Recommender Systems Handbook.

[33]  Philip S. Thomas,et al.  Ad Recommendation Systems for Life-Time Value Optimization , 2015, WWW.