Interactive recommendation via deep neural memory augmented contextual bandits

Personalized recommendation with user interactions has become increasingly popular nowadays in many applications with dynamic change of contents (news, media, etc.). Existing approaches model user interactive recommendation as a contextual bandit problem to balance the trade-off between exploration and exploitation. However, these solutions require a large number of interactions with each user to provide high quality personalized recommendations. To mitigate this limitation, we design a novel deep neural memory augmented mechanism to model and track the history state for each user based on his previous interactions. As such, the user's preferences on new items can be quickly learned within a small number of interactions. Moreover, we develop new algorithms to leverage large amount of all users' history data for offline model training and online model fine tuning for each user with the focus of policy evaluation. Extensive experiments on different synthetic and real-world datasets validate that our proposed approach consistently outperforms a variety of state-of-the-art approaches.

[1]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[2]  Raphaël Féraud,et al.  A Neural Networks Committee for the Contextual Bandit Problem , 2014, ICONIP.

[3]  Daqing He,et al.  Searching, browsing, and clicking in a search session: changes in user behavior by task and over time , 2014, SIGIR.

[4]  Alexandros G. Dimakis,et al.  Latent Contextual Bandits: A Non-Negative Matrix Factorization Approach , 2016, ArXiv.

[5]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[6]  Huazheng Wang,et al.  Learning Hidden Features for Contextual Bandits , 2016, CIKM.

[7]  Filip Radlinski,et al.  Towards Conversational Recommender Systems , 2016, KDD.

[8]  Atsuyoshi Nakamura,et al.  A UCB-Like Strategy of Collaborative Filtering , 2014, ACML.

[9]  Nello Cristianini,et al.  Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.

[10]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[11]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[12]  Andreas Krause,et al.  Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[13]  Long Tran-Thanh,et al.  Efficient Thompson Sampling for Online Matrix-Factorization Recommendation , 2015, NIPS.

[14]  Li Zhou,et al.  Latent Contextual Bandits and their Application to Personalized Recommendations for New Users , 2016, IJCAI.

[15]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[16]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[17]  Raphaël Féraud,et al.  Random Forest for the Contextual Bandit Problem , 2015, AISTATS.

[18]  Quanquan Gu,et al.  Contextual Bandits in a Collaborative Environment , 2016, SIGIR.

[19]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[20]  John Langford,et al.  Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.

[21]  Shuai Li,et al.  Collaborative Filtering Bandits , 2015, SIGIR.

[22]  Daan Wierstra,et al.  One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.

[23]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[24]  Claudio Gentile,et al.  A Gang of Bandits , 2013, NIPS.

[25]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[26]  Hongxia Jin,et al.  EpicRec: Towards Practical Differentially Private Framework for Personalized Recommendation , 2016, CCS.

[27]  John Langford,et al.  Efficient Optimal Learning for Contextual Bandits , 2011, UAI.