Cheap Bandits

We consider stochastic sequential learning problems where the learner can observe the average reward of several actions. Such a setting is interesting in many applications involving monitoring and surveillance, where the set of the actions to observe represent some (geographical) area. The importance of this setting is that in these applications, it is actually cheaper to observe average reward of a group of actions rather than the reward of a single action. We show that when the reward is smooth over a given graph representing the neighboring actions, we can maximize the cumulative reward of learning while minimizing the sensing cost. In this paper we propose CheapUCB, an algorithm that matches the regret guarantees of the known algorithms for this setting and at the same time guarantees a linear cost again over them. As a by-product of our analysis, we establish a Ω(√dT) lower bound on the cumulative regret of spectral bandits for a class of graphs with effective dimension d.

[1]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[2]  Venugopal V. Veeravalli,et al.  Smart Sleeping Policies for Energy Efficient Tracking in Sensor Networks , 2008, IEEE Transactions on Signal Processing.

[3]  Venkatesh Saligrama,et al.  Distributed Detection in Sensor Networks With Limited Range Multimodal Sensors , 2007, IEEE Transactions on Signal Processing.

[4]  Michael G. Rabbat,et al.  Graph spectral compressed sensing for sensor networks , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Luca Trevisan,et al.  Partitioning into Expanders , 2014, SODA.

[6]  Russell Greiner,et al.  Online Learning with Costly Features and Labels , 2013, NIPS.

[7]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[8]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Nicolò Cesa-Bianchi,et al.  Online Learning with Switching Costs and Other Adaptive Adversaries , 2013, NIPS.

[10]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[11]  Tao Qin,et al.  Multi-Armed Bandit with Budget Constraint and Variable Costs , 2013, AAAI.

[12]  Venkatesh Saligrama,et al.  Adaptive statistical sampling methods for decentralized estimation and detection of localized phenomena , 2005, IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005..

[13]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[14]  Archie C. Chapman,et al.  Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits , 2012, AAAI.

[15]  Noga Alon,et al.  From Bandits to Experts: A Tale of Domination and Independence , 2013, NIPS.

[16]  Fuzhen Zhang The Schur complement and its applications , 2005 .

[17]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[18]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[19]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[20]  Shuai Li,et al.  Online Clustering of Bandits , 2014, ICML.

[21]  Sunil K. Narang,et al.  Signal processing techniques for interpolation in graph structured data , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Shie Mannor,et al.  From Bandits to Experts: On the Value of Side-Observations , 2011, NIPS.

[23]  Rémi Munos,et al.  Efficient learning by implicit exploration in bandit problems with side observations , 2014, NIPS.

[24]  John N. Tsitsiklis,et al.  Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[25]  Gary L. Miller,et al.  Combinatorial preconditioners and multilevel solvers for problems in computer vision and image processing , 2009, Comput. Vis. Image Underst..

[26]  John Langford,et al.  Resourceful Contextual Bandits , 2014, COLT.

[27]  Rémi Munos,et al.  Spectral Bandits for Smooth Graph Functions , 2014, ICML.

[28]  Aleksandrs Slivkins,et al.  Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[29]  Venkatesh Saligrama,et al.  Efficient Sensor Management Policies for Distributed Target Tracking in Multihop Sensor Networks , 2008, IEEE Transactions on Signal Processing.

[30]  Claudio Gentile,et al.  A Gang of Bandits , 2013, NIPS.

[31]  Y. Freund,et al.  The non-stochastic multi-armed bandit problem , 2001 .

[32]  Aurélien Garivier,et al.  Parametric Bandits: The Generalized Linear Case , 2010, NIPS.

[33]  Marc Lelarge,et al.  Leveraging Side Observations in Stochastic Bandits , 2012, UAI.