论文信息 - On Context-Dependent Clustering of Bandits - 字舞流文

On Context-Dependent Clustering of Bandits

We investigate a novel cluster-of-bandit algorithm CAB for collaborative recommendation tasks that implements the underlying feedback sharing mechanism by estimating the neighborhood of users in a context-dependent manner. CAB makes sharp departures from the state of the art by incorporating collaborative effects into inference as well as learning processes in a manner that seamlessly interleaving explore-exploit tradeoffs and collaborative steps. We prove regret bounds under various assumptions on the data, which exhibit a crisp dependence on the expected number of clusters over the users, a natural measure of the statistical difficulty of the learning task. Experiments on production and real-world datasets show that CAB offers significantly increased prediction performance against a representative pool of state-of-the-art methods.

Shuai Li | Claudio Gentile | Alexandros Karatzoglou | Giovanni Zappella | Purushottam Kar | Evans Etrue | C. Gentile | Alexandros Karatzoglou | Purushottam Kar | Shuai Li | Giovanni Zappella | Evans Etrue

[1] John Riedl,et al. Recommender systems in e-commerce , 1999, EC '99.

[2] Inderjit S. Dhillon,et al. Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[3] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[4] Inderjit S. Dhillon,et al. Information-theoretic co-clustering , 2003, KDD '03.

[5] Robin D. Burke,et al. Hybrid Systems for Personalized Recommendations , 2003, ITWP.

[6] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[7] John Riedl,et al. ClustKNN: A Highly Scalable Hybrid Model- & Memory-Based CF Algorithm , 2006 .

[8] P. Massart,et al. Concentration inequalities and model selection , 2007 .

[9] Ambuj Tewari,et al. On the Generalization Ability of Online Strongly Convex Programming Algorithms , 2008, NIPS.

[10] Joshua B. Tenenbaum,et al. Modelling Relational Data using Bayesian Clustered Tensor Factorization , 2009, NIPS.

[11] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[12] Domonkos Tikk,et al. Recommending new movies: even a few ratings are more valuable than metadata , 2009, RecSys '09.

[13] Olgica Milenkovic,et al. Subspace Pursuit for Compressive Sensing Signal Reconstruction , 2008, IEEE Transactions on Information Theory.

[14] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[15] Deanna Needell,et al. CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, ArXiv.

[16] John Shawe-Taylor,et al. PAC-Bayesian Analysis of Contextual Bandits , 2011, NIPS.

[17] Andreas Krause,et al. Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[18] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[19] J. Tropp. FREEDMAN'S INEQUALITY FOR MATRIX MARTINGALES , 2011, 1101.3039.

[20] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[21] Koby Crammer,et al. Multiclass classification with bandit feedback using adaptive regularization , 2012, Machine Learning.

[22] Claudio Gentile,et al. Selective sampling and active learning from single and multiple teachers , 2012, J. Mach. Learn. Res..

[23] Rémi Munos,et al. Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit , 2012, AISTATS.

[24] Ryen W. White,et al. Large-scale analysis of individual and task differences in search result page examination strategies , 2012, WSDM '12.

[25] Yisong Yue,et al. Hierarchical Exploration for Accelerating Contextual Bandits , 2012, ICML.

[26] Csaba Szepesvári,et al. Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.

[27] Lihong Li,et al. Sample Complexity of Multi-task Reinforcement Learning , 2013, UAI.

[28] Claudio Gentile,et al. A Gang of Bandits , 2013, NIPS.

[29] Andreas Krause,et al. High-Dimensional Gaussian Process Bandits , 2013, NIPS.

[30] Stéphane Caron,et al. Mixing bandits: a recipe for improved cold-start recommendations in a social network , 2013, SNAKDD '13.

[31] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[32] Alessandro Lazaric,et al. Sequential Transfer in Multi-armed Bandit with Finite Set of Models , 2013, NIPS.

[33] Shie Mannor,et al. Latent Bandits , 2014, ICML.

[34] Shuai Li,et al. Online Clustering of Bandits , 2014, ICML.

[35] Hady Wirawan Lauw,et al. Dynamic Clustering of Contextual Multi-Armed Bandits , 2014, CIKM.

[36] Prateek Jain,et al. On Iterative Hard Thresholding Methods for High-dimensional M-Estimation , 2014, NIPS.

[37] Liang Tang,et al. Ensemble contextual bandits for personalized recommendation , 2014, RecSys '14.

[38] Alexandra Carpentier,et al. Implementable confidence sets in high dimensional regression , 2015, AISTATS.

[39] Li Zhou,et al. Latent Contextual Bandits and their Application to Personalized Recommendations for New Users , 2016, IJCAI.

[40] Quanquan Gu,et al. Contextual Bandits in a Collaborative Environment , 2016, SIGIR.

[41] Shuai Li,et al. Collaborative Filtering Bandits , 2015, SIGIR.

[42] Shuai Li,et al. Distributed Clustering of Linear Bandits in Peer to Peer Networks , 2016, ICML.