Online Context-Dependent Clustering in Recommendations based on Exploration-Exploitation Algorithms

We investigate two context-dependent clustering techniques for content recommendation based on exploration-exploitation strategies in contextual multi-armed bandit settings. Our algorithms dynamically group users based on the items under consideration and, possibly, group items based on the similarity of the clusterings induced over the users. The resulting algorithm thus takes advantage of preference patterns in the data in a way akin to collaborative filtering methods. We provide an empirical analysis on extensive real-world datasets, showing scalability and increased prediction performance over state-of-the-art methods for clustering bandits. For one of the two algorithms we also give a regret analysis within a standard linear stochastic noise setting.

[1]  Srujana Merugu,et al.  A scalable collaborative filtering framework based on co-clustering , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[2]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[3]  Alessandro Lazaric,et al.  Sequential Transfer in Multi-armed Bandit with Finite Set of Models , 2013, NIPS.

[4]  Claudio Gentile,et al.  A Gang of Bandits , 2013, NIPS.

[5]  Andreas Krause,et al.  High-Dimensional Gaussian Process Bandits , 2013, NIPS.

[6]  Hady Wirawan Lauw,et al.  Dynamic Clustering of Contextual Multi-Armed Bandits , 2014, CIKM.

[7]  Koby Crammer,et al.  Multiclass classification with bandit feedback using adaptive regularization , 2012, Machine Learning.

[8]  Stéphane Caron,et al.  Mixing bandits: a recipe for improved cold-start recommendations in a social network , 2013, SNAKDD '13.

[9]  A. M. Madni,et al.  Recommender systems in e-commerce , 2014, 2014 World Automation Congress (WAC).

[10]  John Langford,et al.  Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits , 2012, UAI.

[11]  Lihong Li,et al.  Sample Complexity of Multi-task Reinforcement Learning , 2013, UAI.

[12]  Yisong Yue,et al.  Hierarchical Exploration for Accelerating Contextual Bandits , 2012, ICML.

[13]  Shuai Li,et al.  Distributed Clustering of Linear Bandits in Peer to Peer Networks , 2016, ICML.

[14]  Robin D. Burke,et al.  Hybrid Systems for Personalized Recommendations , 2003, ITWP.

[15]  Claudio Gentile,et al.  Selective sampling and active learning from single and multiple teachers , 2012, J. Mach. Learn. Res..

[16]  Liang Tang,et al.  Ensemble contextual bandits for personalized recommendation , 2014, RecSys '14.

[17]  Bart Goethals,et al.  Unifying nearest neighbors collaborative filtering , 2014, RecSys '14.

[18]  John Riedl,et al.  ClustKNN: A Highly Scalable Hybrid Model- & Memory-Based CF Algorithm , 2006 .

[19]  Shuai Li,et al.  Online Clustering of Bandits , 2014, ICML.

[20]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[21]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[22]  Joshua B. Tenenbaum,et al.  Modelling Relational Data using Bayesian Clustered Tensor Factorization , 2009, NIPS.

[23]  J. Tropp FREEDMAN'S INEQUALITY FOR MATRIX MARTINGALES , 2011, 1101.3039.

[24]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[25]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[26]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[27]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[28]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[29]  Quanquan Gu,et al.  Contextual Bandits in a Collaborative Environment , 2016, SIGIR.

[30]  Rémi Munos,et al.  Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit , 2012, AISTATS.

[31]  Csaba Szepesvári,et al.  Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[32]  Shuai Li,et al.  On Context-Dependent Clustering of Bandits , 2016, ICML.

[33]  J. Tropp,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, Commun. ACM.

[34]  John Shawe-Taylor,et al.  PAC-Bayesian Analysis of Contextual Bandits , 2011, NIPS.

[35]  Shie Mannor,et al.  Latent Bandits , 2014, ICML.

[36]  Devavrat Shah,et al.  A Latent Source Model for Online Collaborative Filtering , 2014, NIPS.

[37]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[38]  Andreas Krause,et al.  Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[39]  Domonkos Tikk,et al.  Recommending new movies: even a few ratings are more valuable than metadata , 2009, RecSys '09.

[40]  Ryen W. White,et al.  Large-scale analysis of individual and task differences in search result page examination strategies , 2012, WSDM '12.

[41]  Prateek Jain,et al.  On Iterative Hard Thresholding Methods for High-dimensional M-Estimation , 2014, NIPS.

[42]  Ambuj Tewari,et al.  On the Generalization Ability of Online Strongly Convex Programming Algorithms , 2008, NIPS.