论文信息 - Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods - 字舞流文

Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

We consider derivative-free algorithms for stochastic optimization problems that use only noisy function values rather than gradients, analyzing their finite-sample convergence rates. We show that if pairs of function values are available, algorithms that use gradient estimates based on random perturbations suffer a factor of at most √d in convergence rate over traditional stochastic gradient methods, where d is the problem dimension. We complement our algorithmic development with information-theoretic lower bounds on the minimax convergence rate of such problems, which show that our bounds are sharp with respect to all problem-dependent quantities: they cannot be improved by more than constant factors.

Martin J. Wainwright | Andre Wibisono | Michael I. Jordan | John C. Duchi | M. Wainwright | Andre Wibisono

[1] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[2] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[3] J. Hiriart-Urruty,et al. Convex analysis and minimization algorithms , 1993 .

[4] Bin Yu. Assouad, Fano, and Le Cam , 1997 .

[5] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[6] K. Ball. An Elementary Introduction to Modern Convex Geometry , 1997 .

[7] Michael I. Jordan. Graphical Models , 2003 .

[8] Claudio Gentile,et al. The Robustness of the p-Norm Algorithms , 1999, COLT '99.

[9] Yuhong Yang,et al. Information-theoretic determination of minimax rates of convergence , 1999 .

[10] V. Buldygin,et al. Metric characterization of random variables and random processes , 2000 .

[11] Arkadi Nemirovski,et al. The Ordered Subsets Mirror Descent Optimization Method with Applications to Tomography , 2001, SIAM J. Optim..

[12] James C. Spall,et al. Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[13] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[14] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[15] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[16] Tim Hesterberg,et al. Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.

[17] Claudio Gentile,et al. On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[18] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[19] James C. Spall,et al. Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .

[20] Thomas P. Hayes,et al. High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.

[21] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[22] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[23] Martin J. Wainwright,et al. Information-theoretic lower bounds on the oracle complexity of convex optimization , 2009, NIPS.

[24] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[25] Ambuj Tewari,et al. Composite objective mirror descent , 2010, COLT 2010.

[26] Lin Xiao,et al. Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[27] John L. Nazareth,et al. Introduction to derivative-free optimization , 2010, Math. Comput..

[28] Sham M. Kakade,et al. Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..

[29] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[30] Martin J. Wainwright,et al. Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.