Improve Single-Point Zeroth-Order Optimization Using High-Pass and Low-Pass Filters

Single-point zeroth-order optimization (SZO) is suitable for solving online blackbox optimization and simulation-based learning-to-control problems. However, the vanilla SZO method suffers from larger variance and slow convergence, which seriously limits its practical application. On the other hand, extremum seeking (ES) control is regarded as the continuous-time version of SZO, while they have been mostly studied separately in the control and optimization communities despite the close relation. In this work, we borrow the idea of high-pass and low-pass filters from ES control to improve the performance of SZO. Specifically, we develop a novel SZO method called HLF-SZO, by integrating a high-pass filter and a low-pass filter into the vanilla SZO method. Interestingly, it turns out that the integration of a high-pass filter coincides with the residual-feedback SZO method, and the integration of a low-pass filter can be interpreted as the momentum method. We prove that HLF-SZO achieves a O(d/T 2 3 ) convergence rate for Lipschitz and smooth objective functions (in both convex and nonconvex cases). Extensive numerical experiments show that the high-pass filter can significantly reduce the variance and the low-pass filter can accelerate the convergence. As a result, the proposed HLF-SZO has a much smaller variance and much faster convergence compared with the vanilla SZO method, and empirically outperforms the state-ofthe-art residual-feedback SZO method.

[1]  Yan Zhang,et al.  Improving the Convergence Rate of One-Point Zeroth-Order Optimization using Residual Feedback , 2020, ArXiv.

[2]  Sean P. Meyn,et al.  Model-Free Primal-Dual Methods for Network Optimization with Application to Real-Time Optimal Power Flow , 2019, 2020 American Control Conference (ACC).

[3]  I. Mareels,et al.  Extremum seeking from 1922 to 2010 , 2010, Proceedings of the 29th Chinese Control Conference.

[4]  Kartik B. Ariyur,et al.  Real-Time Optimization by Extremum-Seeking Control , 2003 .

[5]  Na Li,et al.  Robust hybrid zero-order optimization algorithms with acceleration via averaging in time , 2020, Autom..

[6]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[7]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[8]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[9]  Ambuj Tewari,et al.  Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback , 2011, AISTATS.

[10]  Angelia Nedic,et al.  A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks , 2018, 2020 Information Theory and Applications Workshop (ITA).

[11]  Miroslav Krstic,et al.  Performance improvement and limitations in extremum seeking control , 2000 .

[12]  Jorge I. Poveda,et al.  Model-Free Optimal Voltage Control via Continuous-Time Zeroth-Order Methods , 2021, ArXiv.

[13]  Qing Tao,et al.  The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball Methods , 2021, ICLR.

[14]  Ronen Eldan,et al.  Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff , 2015, NIPS.

[15]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[16]  Ying Tan,et al.  On non-local stability properties of extremum seeking control , 2006, Autom..

[17]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[18]  Martin J. Wainwright,et al.  Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems , 2018, AISTATS.

[19]  Pramod K. Varshney,et al.  A Primer on Zeroth-Order Optimization in Signal Processing and Machine Learning: Principals, Recent Advances, and Applications , 2020, IEEE Signal Processing Magazine.

[20]  Ohad Shamir,et al.  An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback , 2015, J. Mach. Learn. Res..

[21]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[22]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[23]  Hans-Bernd Dürr,et al.  Saddle Point Seeking for Convex Optimization Problems , 2013, NOLCOS.

[24]  Na Li,et al.  Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach , 2019, IEEE Transactions on Automatic Control.

[25]  Maojiao Ye,et al.  Distributed Extremum Seeking for Constrained Networked Optimization and Its Application to Energy Consumption Control in Smart Grid , 2016, IEEE Transactions on Control Systems Technology.