Adaptivity to Smoothness in X-armed bandits

We study the stochastic continuum-armed bandit problem from the angle of adaptivity to unknown regularity of the reward function f . We prove that there exists no strategy for the cumulative regret that adapts optimally to the smoothness of f . We show however that such minimax optimal adaptive strategies exist if the learner is given extra-information about f . Finally, we complement our positive results with matching lower bounds.

[1]  Steve Hanneke,et al.  Adaptive Rates of Convergence in Active Learning , 2009, COLT.

[2]  Pierre C. Bellec,et al.  Adaptive confidence sets in shape restricted regression , 2016, Bernoulli.

[3]  Alexandra Carpentier,et al.  Adaptivity to Noise Parameters in Nonparametric Active Learning , 2017, COLT.

[4]  Marc Hoffmann,et al.  On adaptive inference and confidence bands , 2011, 1202.5145.

[5]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[6]  Haipeng Luo,et al.  Corralling a Band of Bandit Algorithms , 2016, COLT.

[7]  Stanislav Minsker,et al.  Plug-in Approach to Active Learning , 2011, J. Mach. Learn. Res..

[8]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[9]  Rémi Munos,et al.  Optimistic Optimization of Deterministic Functions , 2011, NIPS 2011.

[10]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[11]  T. Tony Cai,et al.  Adaptive Confidence Balls , 2006 .

[12]  Aurélien Garivier,et al.  Explore First, Exploit Next: The True Shape of Regret in Bandit Problems , 2016, Math. Oper. Res..

[13]  Unimodal Bandits Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms , 2013 .

[14]  Sophie Lambert-Lacroix,et al.  On nonparametric confidence set estimation , 2001 .

[15]  A. Tsybakov,et al.  Fast learning rates for plug-in classifiers , 2007, 0708.2321.

[16]  Jia Yuan Yu,et al.  Lipschitz Bandits without the Lipschitz Constant , 2011, ALT.

[17]  Stanislav Minsker,et al.  Estimation of Extreme Values and Associated Level Sets of a Regression Function via Selective Sampling , 2013, COLT.

[18]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[19]  Vladimir Koltchinskii,et al.  Rademacher Complexities and Bounding the Excess Risk in Active Learning , 2010, J. Mach. Learn. Res..

[20]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[21]  Shie Mannor,et al.  Unimodal Bandits , 2011, ICML.

[22]  Rémi Munos,et al.  Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..

[23]  T. Cai,et al.  Adaptive confidence intervals for regression functions under shape constraints , 2013, 1305.5673.

[24]  Eli Upfal,et al.  Bandits and Experts in Metric Spaces , 2013, J. ACM.

[25]  Rémi Munos,et al.  Stochastic Simultaneous Optimistic Optimization , 2013, ICML.

[26]  V. Spokoiny,et al.  Optimal pointwise adaptive methods in nonparametric estimation , 1997 .

[27]  Peter Auer,et al.  Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[28]  Vianney Perchet,et al.  Bounded regret in stochastic multi-armed bandits , 2013, COLT.

[29]  R. Agrawal The Continuum-Armed Bandit Problem , 1995 .

[30]  P. Massart,et al.  From Model Selection to Adaptive Estimation , 1997 .

[31]  Aleksandrs Slivkins,et al.  Multi-armed bandits on implicit metric spaces , 2011, NIPS.

[32]  Rémi Munos,et al.  Black-box optimization of noisy functions with unknown smoothness , 2015, NIPS.