Spectral bandits

Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this work, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each item we can recommend is a node of an undirected graph and its expected rating is similar to the one of its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret with respect to the optimal policy would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose three algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of node evaluations.

[1]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[2]  Michael J. Pazzani,et al.  A learning agent for wireless news access , 2000, IUI '00.

[3]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[4]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[5]  Mikhail Belkin,et al.  Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[6]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .

[7]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[8]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[9]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[10]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[11]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[12]  Gary L. Miller,et al.  Combinatorial preconditioners and multilevel solvers for problems in computer vision and image processing , 2009, Comput. Vis. Image Underst..

[13]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[14]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[15]  Ling Huang,et al.  Online Semi-Supervised Learning on Quantized Graphs , 2010, UAI.

[16]  Peter Auer,et al.  UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..

[17]  Gerhard Friedrich,et al.  Recommender Systems - An Introduction , 2010 .

[18]  Martin Ester,et al.  A matrix factorization technique with trust propagation for recommendation in social networks , 2010, RecSys '10.

[19]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[20]  Aniket Kittur,et al.  Apolo: making sense of large network data by combining rich user interaction and machine learning , 2011, CHI.

[21]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[22]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[23]  Shie Mannor,et al.  From Bandits to Experts: On the Value of Side-Observations , 2011, NIPS.

[24]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[25]  Shie Mannor,et al.  Unimodal Bandits , 2011, ICML.

[26]  Aleksandrs Slivkins,et al.  Contextual Bandits with Similarity Information , 2009, COLT.

[27]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[28]  Andreas Krause,et al.  Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.

[29]  Rémi Munos,et al.  Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.

[30]  Sham M. Kakade,et al.  Towards Minimax Policies for Online Linear Optimization with Bandit Feedback , 2012, COLT.

[31]  Marc Lelarge,et al.  Leveraging Side Observations in Stochastic Bandits , 2012, UAI.

[32]  David S. Leslie,et al.  Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2012, J. Mach. Learn. Res..

[33]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[34]  Nello Cristianini,et al.  Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.

[35]  Noga Alon,et al.  From Bandits to Experts: A Tale of Domination and Independence , 2013, NIPS.

[36]  Shipra Agrawal,et al.  Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.

[37]  Sunil K. Narang,et al.  Signal processing techniques for interpolation in graph structured data , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[38]  Claudio Gentile,et al.  A Gang of Bandits , 2013, NIPS.

[39]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[40]  Rémi Munos,et al.  Spectral Bandits for Smooth Graph Functions , 2014, ICML.

[41]  Alexandre Proutière,et al.  Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms , 2014, ICML.

[42]  Shuai Li,et al.  Online Clustering of Bandits , 2014, ICML.

[43]  Rémi Munos,et al.  Efficient learning by implicit exploration in bandit problems with side observations , 2014, NIPS.

[44]  Jiawei Han,et al.  Online Spectral Learning on a Graph with Bandit Feedback , 2014, 2014 IEEE International Conference on Data Mining.

[45]  Atilla Eryilmaz,et al.  Stochastic bandits with side observations on networks , 2014, SIGMETRICS '14.

[46]  Rémi Munos,et al.  Spectral Thompson Sampling , 2014, AAAI.

[47]  Dacheng Tao,et al.  Networked bandits with disjoint linear payoffs , 2014, KDD.

[48]  Jianwei Chen STAT 670B Advanced Mathematical Statistics , 2015 .

[49]  Venkatesh Saligrama,et al.  Cheap Bandits , 2015, ICML.

[50]  Noga Alon,et al.  Online Learning with Feedback Graphs: Beyond Bandits , 2015, COLT.

[51]  Jeff G. Schneider,et al.  Active Search and Bandits on Graphs using Sigma-Optimality , 2015, UAI.

[52]  Michal Valko,et al.  Online Learning with Noisy Side Observations , 2016, AISTATS.

[53]  Shuai Li,et al.  Collaborative Filtering Bandits , 2015, SIGIR.

[54]  Michal Valko,et al.  Online learning with Erdos-Renyi side-observation graphs , 2016, UAI.

[55]  Shuai Li,et al.  Distributed Clustering of Linear Bandits in Peer to Peer Networks , 2016, ICML.

[56]  Shuai Li,et al.  On Context-Dependent Clustering of Bandits , 2016, ICML.

[57]  Noga Alon,et al.  Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback , 2014, SIAM J. Comput..

[58]  Alessandro Lazaric,et al.  Linear Thompson Sampling Revisited , 2016, AISTATS.