DRN: A Deep Reinforcement Learning Framework for News Recommendation

In this paper, we propose a novel Deep Reinforcement Learning framework for news recommendation. Online personalized news recommendation is a highly challenging problem due to the dynamic nature of news features and user preferences. Although some online recommendation models have been proposed to address the dynamic nature of news recommendation, these methods have three major issues. First, they only try to model current reward (e.g., Click Through Rate). Second, very few studies consider to use user feedback other than click / no click labels (e.g., how frequent user returns) to help improve recommendation. Third, these methods tend to keep recommending similar news to users, which may cause users to get bored. Therefore, to address the aforementioned challenges, we propose a Deep Q-Learning based recommendation framework, which can model future reward explicitly. We further consider user return pattern as a supplement to click / no click label in order to capture more user feedback information. In addition, an effective exploration strategy is incorporated to find new attractive news for users. Extensive experiments are conducted on the offline dataset and online production environment of a commercial news recommendation application and have shown the superior performance of our methods.

[1]  Katja Hofmann,et al.  Reusing historical interaction data for faster online learning to rank for IR , 2013, DIR.

[2]  Alexander J. Smola,et al.  Neural Survival Recommender , 2017, WSDM.

[3]  Claudio Gentile,et al.  A Gang of Bandits , 2013, NIPS.

[4]  Francesco Ricci,et al.  Learning and adaptivity in interactive recommender systems , 2007, ICEC.

[5]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[6]  David Hsu,et al.  Exploration in Interactive Personalized Music Recommendation: A Reinforcement Learning Approach , 2013, TOMM.

[7]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[8]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[9]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[10]  Atsuyoshi Nakamura,et al.  A UCB-Like Strategy of Collaborative Filtering , 2014, ACML.

[11]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[12]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[13]  Qing Wang,et al.  Online Context-Aware Recommendation with Time Varying Multi-Armed Bandit , 2016, KDD.

[14]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[15]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[16]  J. Bobadilla,et al.  Recommender systems survey , 2013, Knowl. Based Syst..

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Jiahui Liu,et al.  Personalized news recommendation based on click behavior , 2010, IUI '10.

[19]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[20]  Huazheng Wang,et al.  Factorization Bandits for Interactive Recommendation , 2017, AAAI.

[21]  M. May Bayesian Survival Analysis. , 2002 .

[22]  Shuai Li,et al.  Online Clustering of Bandits , 2014, ICML.

[23]  Artem Grotov,et al.  Online Learning to Rank for Information Retrieval: SIGIR 2016 Tutorial , 2016, SIGIR.

[24]  Tie-Yan Liu,et al.  A Theoretical Analysis of NDCG Type Ranking Measures , 2013, COLT.

[25]  Alda Lopes Gançarski,et al.  A Contextual-Bandit Algorithm for Mobile Context-Aware Recommender System , 2012, ICONIP.

[26]  Guy Shani,et al.  An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..

[27]  Liang Tang,et al.  Personalized Recommendation via Parameter-Free Contextual Bandits , 2015, SIGIR.

[28]  Jun Wang,et al.  Interactive collaborative filtering , 2013, CIKM.

[29]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[30]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[31]  Gediminas Adomavicius,et al.  Improving Aggregate Recommendation Diversity Using Ranking-Based Techniques , 2012, IEEE Transactions on Knowledge and Data Engineering.

[32]  Balaji Padmanabhan,et al.  SCENE: a scalable two-stage personalized news recommendation system , 2011, SIGIR.

[33]  Rupert G. Miller,et al.  Survival Analysis , 2022, The SAGE Encyclopedia of Research Design.

[34]  Huazheng Wang,et al.  Learning Hidden Features for Contextual Bandits , 2016, CIKM.

[35]  Lei Zheng,et al.  Joint Deep Modeling of Users and Items Using Reviews for Recommendation , 2017, WSDM.

[36]  Liangjie Hong,et al.  Returning is Believing: Optimizing Long-term User Engagement in Recommender Systems , 2017, CIKM.

[37]  Le Song,et al.  Time-Sensitive Recommendation From Recurrent User Activities , 2015, NIPS.

[38]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[39]  Flavius Frasincar,et al.  Ontology-based news recommendation , 2010, EDBT '10.

[40]  Aristides Gionis,et al.  From chatter to headlines: harnessing the real-time web for personalized news recommendation , 2012, WSDM '12.

[41]  Long Tran-Thanh,et al.  Efficient Thompson Sampling for Online Matrix-Factorization Recommendation , 2015, NIPS.

[42]  Lantao Yu,et al.  Dynamic Attention Deep Model for Article Recommendation by Learning Human Editors' Demonstration , 2017, KDD.

[43]  Mária Bieliková,et al.  Content-Based News Recommendation , 2010, EC-Web.

[44]  Pornthep Rojanavasu,et al.  New Recommendation System Using Reinforcement Learning , 2005 .

[45]  Richard S. Zemel,et al.  The multiple multiplicative factor model for collaborative filtering , 2004, ICML.

[46]  Barry Smyth,et al.  Terms of a Feather: Content-Based News Recommendation and Discovery Using Twitter , 2011, ECIR.

[47]  Qiang Yang,et al.  Partially Observable Markov Decision Process for Recommender Systems , 2016, ArXiv.

[48]  Saeed Shiry Ghidary,et al.  Usage-based web recommendations: a reinforcement learning approach , 2007, RecSys '07.

[49]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[50]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[51]  Liang Tang,et al.  Ensemble contextual bandits for personalized recommendation , 2014, RecSys '14.