论文信息 - Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization - 字舞流文

Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization

Two types of zeroth-order stochastic algorithms have recently been designed for nonconvex optimization respectively based on the first-order techniques SVRG and SARAH/SPIDER. This paper addresses several important issues that are still open in these methods. First, all existing SVRG-type zeroth-order algorithms suffer from worse function query complexities than either zeroth-order gradient descent (ZO-GD) or stochastic gradient descent (ZO-SGD). In this paper, we propose a new algorithm ZO-SVRG-Coord-Rand and develop a new analysis for an existing ZO-SVRG-Coord algorithm proposed in Liu et al. 2018b, and show that both ZO-SVRG-Coord-Rand and ZO-SVRG-Coord (under our new analysis) outperform other exiting SVRG-type zeroth-order methods as well as ZO-GD and ZO-SGD. Second, the existing SPIDER-type algorithm SPIDER-SZO (Fang et al. 2018) has superior theoretical performance, but suffers from the generation of a large number of Gaussian random variables as well as a $\sqrt{\epsilon}$-level stepsize in practice. In this paper, we develop a new algorithm ZO-SPIDER-Coord, which is free from Gaussian variable generation and allows a large constant stepsize while maintaining the same convergence rate and query complexity, and we further show that ZO-SPIDER-Coord automatically achieves a linear convergence rate as the iterate enters into a local PL region without restart and algorithmic modification.

Yi Zhou | Yingbin Liang | Zhe Wang | Kaiyi Ji | Yi Zhou | Yingbin Liang | Kaiyi Ji | Zhe Wang

[1] Yingbin Liang,et al. Faster Stochastic Algorithms via History-Gradient Aided Batch Size Adaptation , 2019, ArXiv.

[2] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[3] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[4] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[5] Jian Li,et al. A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization , 2018, NeurIPS.

[6] Jie Liu,et al. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[7] Krishnakumar Balasubramanian,et al. Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates , 2018, NeurIPS.

[8] Sivaraman Balakrishnan,et al. Stochastic Zeroth-order Optimization in High Dimensions , 2017, AISTATS.

[9] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[10] Ananthram Swami,et al. Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[11] H. Robbins. A Stochastic Approximation Method , 1951 .

[12] Cho-Jui Hsieh,et al. A Comprehensive Linear Speedup Analysis for Asynchronous Stochastic Parallel Optimization from Zeroth-Order to First-Order , 2016, NIPS.

[13] Bowen Weng,et al. History-Gradient Aided Batch Size Adaptation for Variance Reduced Algorithms , 2020, ICML.

[14] Bin Gu,et al. Faster Derivative-Free Stochastic Algorithm for Shared Memory Machines , 2018, ICML.

[15] Yi Zhou,et al. SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization , 2018, ArXiv.

[16] Ben Taskar,et al. Learning structured prediction models: a large margin approach , 2005, ICML.

[17] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[18] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[19] Shiyu Chang,et al. Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization , 2018, NeurIPS.

[20] Samy Bengio,et al. Adversarial Machine Learning at Scale , 2016, ICLR.

[21] Martin J. Wainwright,et al. Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations , 2013, IEEE Transactions on Information Theory.

[22] Alexander J. Smola,et al. Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization , 2016, NIPS.

[23] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.

[24] Jinfeng Yi,et al. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[25] Lin Xiao,et al. Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[26] Yi Zhou,et al. Geometrical properties and accelerated gradient solvers of non-convex phase retrieval , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[27] Jie Liu,et al. Stochastic Recursive Gradient Algorithm for Nonconvex Optimization , 2017, ArXiv.

[28] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[29] Yurii Nesterov,et al. Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[30] Quanquan Gu,et al. Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization , 2018, NeurIPS.

[31] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[32] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[33] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[34] Richard E. Turner,et al. Structured Evolution with Compact Architectures for Scalable Policy Optimization , 2018, ICML.

[35] Liu Liu,et al. Stochastic Zeroth-order Optimization via Variance Reduction method , 2018, ArXiv.

[36] Artin,et al. SARAH : A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017 .

[37] Ohad Shamir,et al. On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization , 2012, COLT.

[38] Saeed Ghadimi,et al. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[39] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[40] Boris Polyak. Gradient methods for the minimisation of functionals , 1963 .

[41] Xiang Gao,et al. On the Information-Adaptive Variants of the ADMM: An Iteration Complexity Perspective , 2017, Journal of Scientific Computing.

[42] Zeyuan Allen Zhu,et al. Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.