Optimal Stochastic Nonconvex Optimization with Bandit Feedback

In this paper, we analyze the continuous armed bandit problems for nonconvex cost functions under certain smoothness and sublevel set assumptions. We first derive an upper bound on the expected cumulative regret of a simple bin splitting method. We then propose an adaptive bin splitting method, which can significantly improve the performance. Furthermore, a minimax lower bound is derived, which shows that our new adaptive method achieves locally minimax optimal expected cumulative regret.

[1]  Vahid Tarokh,et al.  On Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits , 2016, IEEE Transactions on Signal Processing.

[2]  Peter Auer,et al.  Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[3]  Suleyman S. Kozat,et al.  Minimax Optimal Algorithms for Adversarial Bandit Problem With Multiple Plays , 2019, IEEE Transactions on Signal Processing.

[4]  Sivaraman Balakrishnan,et al.  Optimization of Smooth Functions With Noisy Observations: Local Minimax Rates , 2018, IEEE Transactions on Information Theory.

[5]  Yingcun Xia,et al.  Bias‐corrected confidence bands in nonparametric regression , 1998 .

[6]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[7]  Qing Zhao,et al.  Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.

[8]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[9]  Xin Fu,et al.  Confidence bands in nonparametric regression , 2009 .

[10]  Ohad Shamir,et al.  On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization , 2012, COLT.

[11]  Jörg Polzehl,et al.  Simultaneous bootstrap confidence bands in nonparametric regression , 1998 .

[12]  T. Lai Adaptive treatment allocation and the multi-armed bandit problem , 1987 .

[13]  R. Munos,et al.  Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.

[14]  Rémi Munos,et al.  Algorithms for Infinitely Many-Armed Bandits , 2008, NIPS.

[15]  Aleksandrs Slivkins,et al.  Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..

[16]  Cong Shen Universal Best Arm Identification , 2019, IEEE Transactions on Signal Processing.

[17]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[18]  Adam D. Bull,et al.  Adaptive-treed bandits , 2013, 1302.2489.

[19]  Zongwu Cai,et al.  Weighted Nadaraya–Watson regression estimation , 2001 .

[20]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[21]  Rémi Munos,et al.  Optimistic Optimization of Deterministic Functions , 2011, NIPS 2011.

[22]  Aurélien Garivier,et al.  The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[23]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[24]  Jia Yuan Yu,et al.  Lipschitz Bandits without the Lipschitz Constant , 2011, ALT.

[25]  Yin Tat Lee,et al.  Kernel-based methods for bandit convex optimization , 2016, STOC.

[26]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[27]  Vianney Perchet,et al.  Highly-Smooth Zero-th Order Online Optimization , 2016, COLT.

[28]  Stanislav Minsker,et al.  Estimation of Extreme Values and Associated Level Sets of a Regression Function via Selective Sampling , 2013, COLT.

[29]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[30]  Sham M. Kakade,et al.  Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..

[31]  Robert D. Nowak,et al.  Query Complexity of Derivative-Free Optimization , 2012, NIPS.

[32]  Elad Hazan,et al.  Bandit Convex Optimization: Towards Tight Bounds , 2014, NIPS.

[33]  Alexandra Carpentier,et al.  Adaptivity to Smoothness in X-armed bandits , 2018, COLT.

[34]  Cong Shen,et al.  Cost-Aware Cascading Bandits , 2018, IEEE Transactions on Signal Processing.