On Context-Dependent Clustering of Bandits

We investigate a novel cluster-of-bandit algorithm CAB for collaborative recommendation tasks that implements the underlying feedback sharing mechanism by estimating the neighborhood of users in a context-dependent manner. CAB makes sharp departures from the state of the art by incorporating collaborative effects into inference as well as learning processes in a manner that seamlessly interleaving explore-exploit tradeoffs and collaborative steps. We prove regret bounds under various assumptions on the data, which exhibit a crisp dependence on the expected number of clusters over the users, a natural measure of the statistical difficulty of the learning task. Experiments on production and real-world datasets show that CAB offers significantly increased prediction performance against a representative pool of state-of-the-art methods.

[1]  John Riedl,et al.  Recommender systems in e-commerce , 1999, EC '99.

[2]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[3]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[4]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[5]  Robin D. Burke,et al.  Hybrid Systems for Personalized Recommendations , 2003, ITWP.

[6]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[7]  John Riedl,et al.  ClustKNN: A Highly Scalable Hybrid Model- & Memory-Based CF Algorithm , 2006 .

[8]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[9]  Ambuj Tewari,et al.  On the Generalization Ability of Online Strongly Convex Programming Algorithms , 2008, NIPS.

[10]  Joshua B. Tenenbaum,et al.  Modelling Relational Data using Bayesian Clustered Tensor Factorization , 2009, NIPS.

[11]  Csaba Szepesvári,et al.  Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[12]  Domonkos Tikk,et al.  Recommending new movies: even a few ratings are more valuable than metadata , 2009, RecSys '09.

[13]  Olgica Milenkovic,et al.  Subspace Pursuit for Compressive Sensing Signal Reconstruction , 2008, IEEE Transactions on Information Theory.

[14]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[15]  Deanna Needell,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, ArXiv.

[16]  John Shawe-Taylor,et al.  PAC-Bayesian Analysis of Contextual Bandits , 2011, NIPS.

[17]  Andreas Krause,et al.  Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[18]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[19]  J. Tropp FREEDMAN'S INEQUALITY FOR MATRIX MARTINGALES , 2011, 1101.3039.

[20]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[21]  Koby Crammer,et al.  Multiclass classification with bandit feedback using adaptive regularization , 2012, Machine Learning.

[22]  Claudio Gentile,et al.  Selective sampling and active learning from single and multiple teachers , 2012, J. Mach. Learn. Res..

[23]  Rémi Munos,et al.  Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit , 2012, AISTATS.

[24]  Ryen W. White,et al.  Large-scale analysis of individual and task differences in search result page examination strategies , 2012, WSDM '12.

[25]  Yisong Yue,et al.  Hierarchical Exploration for Accelerating Contextual Bandits , 2012, ICML.

[26]  Csaba Szepesvári,et al.  Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.

[27]  Lihong Li,et al.  Sample Complexity of Multi-task Reinforcement Learning , 2013, UAI.

[28]  Claudio Gentile,et al.  A Gang of Bandits , 2013, NIPS.

[29]  Andreas Krause,et al.  High-Dimensional Gaussian Process Bandits , 2013, NIPS.

[30]  Stéphane Caron,et al.  Mixing bandits: a recipe for improved cold-start recommendations in a social network , 2013, SNAKDD '13.

[31]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[32]  Alessandro Lazaric,et al.  Sequential Transfer in Multi-armed Bandit with Finite Set of Models , 2013, NIPS.

[33]  Shie Mannor,et al.  Latent Bandits , 2014, ICML.

[34]  Shuai Li,et al.  Online Clustering of Bandits , 2014, ICML.

[35]  Hady Wirawan Lauw,et al.  Dynamic Clustering of Contextual Multi-Armed Bandits , 2014, CIKM.

[36]  Prateek Jain,et al.  On Iterative Hard Thresholding Methods for High-dimensional M-Estimation , 2014, NIPS.

[37]  Liang Tang,et al.  Ensemble contextual bandits for personalized recommendation , 2014, RecSys '14.

[38]  Alexandra Carpentier,et al.  Implementable confidence sets in high dimensional regression , 2015, AISTATS.

[39]  Li Zhou,et al.  Latent Contextual Bandits and their Application to Personalized Recommendations for New Users , 2016, IJCAI.

[40]  Quanquan Gu,et al.  Contextual Bandits in a Collaborative Environment , 2016, SIGIR.

[41]  Shuai Li,et al.  Collaborative Filtering Bandits , 2015, SIGIR.

[42]  Shuai Li,et al.  Distributed Clustering of Linear Bandits in Peer to Peer Networks , 2016, ICML.