Optimization of Smooth Functions With Noisy Observations: Local Minimax Rates

We consider the problem of <italic>global optimization</italic> of an unknown non-convex smooth function with noisy zeroth-order feedback. We propose a <italic>local minimax</italic> framework to study the fundamental difficulty of optimizing smooth functions with adaptive function evaluations. We show that for functions with fast growth around their global minima, carefully designed optimization algorithms can identify a near global minimizer with many fewer queries than worst-case global minimax theory predicts. For the special case of strongly convex and smooth functions, our implied convergence rates match the ones developed for zeroth-order <italic>convex</italic> optimization problems. On the other hand, we show that in the worst case no algorithm can converge faster than the minimax rate of estimating an unknown function in the <inline-formula> <tex-math notation="LaTeX">$\ell _\infty $ </tex-math></inline-formula>-norm. Finally, we show that non-adaptive algorithms, though optimal in a global minimax sense, do not attain the optimal local minimax rate.

[1]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[2]  Yuchen Zhang,et al.  A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[3]  W. Newey,et al.  Convergence rates and asymptotic normality for series estimators , 1997 .

[4]  John C. Duchi,et al.  Asymptotic optimality in stochastic optimization , 2016, The Annals of Statistics.

[5]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[6]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[7]  Yang Yuan,et al.  Hyperparameter Optimization: A Spectral Approach , 2017, ICLR.

[8]  E. Mammen,et al.  Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors , 1997 .

[9]  Yin Tat Lee,et al.  Kernel-based methods for bandit convex optimization , 2016, STOC.

[10]  Rémi Munos,et al.  Black-box optimization of noisy functions with unknown smoothness , 2015, NIPS.

[11]  T. Cai,et al.  An adaptation theory for nonparametric confidence intervals , 2004, math/0503662.

[12]  Stanislav Minsker,et al.  Non-asymptotic bounds for prediction problems and density estimation , 2012 .

[13]  Ulrike von Luxburg,et al.  Consistent Procedures for Cluster Tree Estimation and Pruning , 2014, IEEE Transactions on Information Theory.

[14]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[15]  Alexandra Carpentier,et al.  Adaptivity to Smoothness in X-armed bandits , 2018, COLT.

[16]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[17]  David S. Ebert,et al.  Texturing & modeling : a procedural approach : 日本語版 , 2009 .

[18]  Yuanzhi Li,et al.  Algorithms and matching lower bounds for approximately-convex optimization , 2016, NIPS.

[19]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[20]  Robert D. Nowak,et al.  Query Complexity of Derivative-Free Optimization , 2012, NIPS.

[21]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[22]  G. E. Noether,et al.  Nonparametric Confidence Intervals , 2006 .

[23]  Robert D. Nowak,et al.  Minimax Bounds for Active Learning , 2007, IEEE Transactions on Information Theory.

[24]  Jianqing Fan,et al.  Local polynomial modelling and its applications , 1994 .

[25]  G. T. Timmer,et al.  Stochastic global optimization methods part II: Multi level methods , 1987, Math. Program..

[26]  R. Castro Adaptive sensing performance lower bounds for sparse signal detection and support estimation , 2012, 1206.0648.

[27]  Joel A. Tropp,et al.  An Introduction to Matrix Concentration Inequalities , 2015, Found. Trends Mach. Learn..

[28]  Stanislav Minsker,et al.  Estimation of Extreme Values and Associated Level Sets of a Regression Function via Selective Sampling , 2013, COLT.

[29]  Sham M. Kakade,et al.  A tail inequality for quadratic forms of subgaussian random vectors , 2011, ArXiv.

[30]  Hung Chen Lower Rate of Convergence for Locating a Maximum of a Function , 1988 .

[31]  John D. Lafferty,et al.  Local Minimax Complexity of Stochastic Convex Optimization , 2016, NIPS.

[32]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[33]  Sivaraman Balakrishnan,et al.  Cluster Trees on Manifolds , 2013, NIPS.

[34]  Lin Xiao,et al.  Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[35]  Robert L. Smith,et al.  Pure adaptive search in global optimization , 1992, Math. Program..

[36]  David S. Ebert,et al.  Texturing and Modeling: A Procedural Approach , 1994 .

[37]  Sham M. Kakade,et al.  Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..

[38]  Tengyu Ma,et al.  Finding approximate local minima faster than gradient descent , 2016, STOC.

[39]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[40]  Shai Shalev-Shwartz,et al.  Beyond Convexity: Stochastic Quasi-Convex Optimization , 2015, NIPS.

[41]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[42]  W. Polonik Measuring Mass Concentrations and Estimating Density Contour Clusters-An Excess Mass Approach , 1995 .

[43]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[44]  Michael I. Jordan,et al.  On the Local Minima of the Empirical Risk , 2018, NeurIPS.

[45]  Nicolas Vayatis,et al.  A ranking approach to global optimization , 2016, ICML.

[46]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[47]  P. Rigollet,et al.  Optimal rates for plug-in estimators of density level sets , 2006, math/0611473.

[48]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[49]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[50]  Yair Carmon,et al.  "Convex Until Proven Guilty": Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions , 2017, ICML.

[51]  Nicolas Vayatis,et al.  Global optimization of Lipschitz functions , 2017, ICML.

[52]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[53]  Robert D. Nowak,et al.  Adaptive Hausdorff Estimation of Density Level Sets , 2009, COLT.

[54]  J. Kadane,et al.  Design for low‐temperature microwave‐assisted crystallization of ceramic thin films , 2017 .

[55]  John C. Duchi,et al.  Local Asymptotics for some Stochastic Optimization Problems: Optimality, Constraint Identification, and Dual Averaging , 2016 .

[56]  G. T. Timmer,et al.  Stochastic global optimization methods part I: Clustering methods , 1987, Math. Program..

[57]  Sanjoy Dasgupta,et al.  A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[58]  Steve Hanneke,et al.  A bound on the label complexity of agnostic active learning , 2007, ICML '07.

[59]  Arumugam Manthiram,et al.  Microwave-assisted Low-temperature Growth of Thin Films in Solution , 2012, Scientific reports.

[60]  Volkan Cevher,et al.  Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization , 2017, COLT.