Ensemble contextual bandits for personalized recommendation

The cold-start problem has attracted extensive attention among various online services that provide personalized recommendation. Many online vendors employ contextual bandit strategies to tackle the so-called exploration/exploitation dilemma rooted from the cold-start problem. However, due to high-dimensional user/item features and the underlying characteristics of bandit policies, it is often difficult for service providers to obtain and deploy an appropriate algorithm to achieve acceptable and robust economic profit. In this paper, we explore ensemble strategies of contextual bandit algorithms to obtain robust predicted click-through rate (CTR) of web objects. The ensemble is acquired by aggregating different pulling policies of bandit algorithms, rather than forcing the agreement of prediction results or learning a unified predictive model. To this end, we employ a meta-bandit paradigm that places a hyper bandit over the base bandits, to explicitly explore/exploit the relative importance of base bandits based on user feedbacks. Extensive empirical experiments on two real-world data sets (news recommendation and online advertising) demonstrate the effectiveness of our proposed approach in terms of CTR.

[1]  Yehuda Koren,et al.  The BellKor Solution to the Netflix Grand Prize , 2009 .

[2]  Joseph Sill,et al.  Feature-Weighted Linear Stacking , 2009, ArXiv.

[3]  M. Kendall Probability and Statistical Inference , 1956, Nature.

[4]  John Riedl,et al.  Meta-recommendation systems: user-controlled integration of diverse recommendations , 2002, CIKM '02.

[5]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[6]  Mihaela van der Schaar,et al.  Decentralized Online Big Data Classification - a Bandit Framework , 2013, ArXiv.

[7]  J. Langford,et al.  The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[8]  Kuan-Wei Wu,et al.  A Two-Stage Ensemble of Diverse Models for Advertisement Ranking in KDD Cup 2012 , 2012 .

[9]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[10]  Robert A. Legenstein,et al.  Combining predictions for accurate recommender systems , 2010, KDD.

[11]  Steven L. Scott,et al.  A modern Bayesian look at the multi-armed bandit , 2010 .

[12]  Alda Lopes Gançarski,et al.  A Contextual-Bandit Algorithm for Mobile Context-Aware Recommender System , 2012, ICONIP.

[13]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[14]  P. J. Green,et al.  Probability and Statistical Inference , 1978 .

[15]  M. Wu,et al.  Collaborative Filtering via Ensembles of Matrix Factorizations , 2007, KDD 2007.

[16]  Meta Learning in Recommendation Systems , 2013 .

[17]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[18]  Marco Tiemann,et al.  Towards ensemble learning for hybrid music recommendation , 2007, RecSys '07.

[19]  Louis Wehenkel,et al.  Meta-learning of Exploration/Exploitation Strategies: The Multi-armed Bandit Case , 2012, ICAART.

[20]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[21]  Michèle Sebag,et al.  Change Point Detection and Meta-Bandits for Online Learning in Dynamic Environments , 2007 .

[22]  Michel Tokic Adaptive ε-greedy Exploration in Reinforcement Learning Based on Value Differences , 2010 .

[23]  Andreas Maurer,et al.  Algorithmic Stability and Meta-Learning , 2005, J. Mach. Learn. Res..

[24]  David M. Pennock,et al.  Categories and Subject Descriptors , 2001 .

[25]  Philip K. Chan,et al.  Meta-learning in distributed data mining systems: Issues and approaches , 2007 .

[26]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[27]  Shou-De Lin,et al.  Novel Models and Ensemble Techniques to Discriminate Favorite Items from Unrated Ones for Personalized Music Recommendation , 2012, KDD Cup.

[28]  Guy Shani,et al.  Evaluating Recommendation Systems , 2011, Recommender Systems Handbook.

[29]  Mehryar Mohri,et al.  Multi-armed Bandit Algorithms and Empirical Evaluation , 2005, ECML.

[30]  Jürgen Schmidhuber,et al.  Algorithm Selection as a Bandit Problem with Unbounded Losses , 2008, LION.

[31]  Deepak Agarwal,et al.  Online Models for Content Optimization , 2008, NIPS.

[32]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[33]  Wei-Ying Ma,et al.  Collaborative Ensemble Learning: Combining Collaborative and Content-Based Information Filtering via Hierarchical Bayes , 2002, UAI.

[34]  J MatteoGagliolo Algorithm Selection as a Bandit Problem with Unbounded Losses , 2008 .

[35]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[36]  Christophe G. Giraud-Carrier Metalearning - A Tutorial , 2008 .

[37]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[38]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.