Explore, exploit, and explain: personalizing explainable recommendations with bandits

The multi-armed bandit is an important framework for balancing exploration with exploitation in recommendation. Exploitation recommends content (e.g., products, movies, music playlists) with the highest predicted user engagement and has traditionally been the focus of recommender systems. Exploration recommends content with uncertain predicted user engagement for the purpose of gathering more information. The importance of exploration has been recognized in recent years, particularly in settings with new users, new items, non-stationary preferences and attributes. In parallel, explaining recommendations ("recsplanations") is crucial if users are to understand their recommendations. Existing work has looked at bandits and explanations independently. We provide the first method that combines both in a principled manner. In particular, our method is able to jointly (1) learn which explanations each user responds to; (2) learn the best content to recommend for each user; and (3) balance exploration with exploitation to deal with uncertainty. Experiments with historical log data and tests with live production traffic in a large-scale music recommendation service show a significant improvement in user engagement.

[1]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[2]  Thorsten Joachims,et al.  Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement , 2016, SIGIR.

[3]  Barbara E. Engelhardt,et al.  How algorithmic confounding in recommendation systems increases homogeneity and decreases utility , 2017, RecSys.

[4]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[5]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[6]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[7]  Guy Shani,et al.  An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..

[8]  David Hsu,et al.  Exploration in Interactive Personalized Music Recommendation: A Reinforcement Learning Approach , 2013, TOMM.

[9]  T. Joachims WebWatcher : A Tour Guide for the World Wide Web , 1997 .

[10]  John Langford,et al.  Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.

[11]  Dorota Glowacka,et al.  Improving Controllability and Predictability of Interactive Recommendation Interfaces for Exploratory Search , 2015, IUI.

[12]  Lise Getoor,et al.  User Preferences for Hybrid Explanations , 2017, RecSys.

[13]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[14]  Alex M. Andrew,et al.  Reinforcement Learning: : An Introduction , 1998 .

[15]  Anongnart Srivihok,et al.  E-commerce intelligent agent: personalization travel support agent using Q Learning , 2005, ICEC '05.

[16]  Gerhard Friedrich,et al.  A Taxonomy for Generating Explanations in Recommender Systems , 2011, AI Mag..

[17]  Zheng Wen,et al.  Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[18]  Judith Masthoff,et al.  A Survey of Explanations in Recommender Systems , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Paul Covington,et al.  Deep Neural Networks for YouTube Recommendations , 2016, RecSys.

[21]  John Langford,et al.  Off-policy evaluation for slate recommendation , 2016, NIPS.

[22]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[23]  Tobias Höllerer,et al.  TasteWeights: a visual interactive hybrid recommender system , 2012, RecSys.

[24]  Zheng Wen,et al.  Matroid Bandits: Fast Combinatorial Optimization with Learning , 2014, UAI.

[25]  Thomas Nedelec,et al.  Offline A/B Testing for Recommender Systems , 2018, WSDM.