Distributed Zero-Order Optimization under Adversarial Noise

We study the problem of distributed zero-order optimization for a class of strongly convex functions. They are formed by the average of local objectives, associated to different nodes in a prescribed network. We propose a distributed zero-order projected gradient descent algorithm to solve the problem. Exchange of information within the network is permitted only between neighbouring nodes. An important feature of our procedure is that it can query only function values, subject to a general noise model, that does not require zero mean or independent errors. We derive upper bounds for the average cumulative regret and optimization error of the algorithm which highlight the role played by a network connectivity parameter, the number of variables, the noise level, the strong convexity parameter, and smoothness properties of the local objectives. The bounds indicate some key improvements of our method over the state-of-the-art, both in the distributed and standard zero-order optimization settings. We also comment on lower bounds and observe that the dependency over certain function parameters in the bound is nearly optimal.

[1]  Daniel W. C. Ho,et al.  Distributed Randomized Gradient-Free Mirror Descent Algorithm for Constrained Optimization , 2019, IEEE Transactions on Automatic Control.

[2]  A. Gasnikov,et al.  Improved Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandit , 2021, 2101.03821.

[3]  Seong-Lyun Kim,et al.  Communication-Efficient and Distributed Learning Over Wireless Networks: Principles and Applications , 2020, Proceedings of the IEEE.

[4]  A. Nedić,et al.  Push–Pull Gradient Methods for Distributed Optimization in Networks , 2018, IEEE Transactions on Automatic Control.

[5]  A. Tsybakov,et al.  Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits , 2020, NeurIPS.

[6]  Na Li,et al.  Distributed Zero-Order Algorithms for Nonconvex Multi-Agent optimization , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[7]  Dusan Jakovetic,et al.  A Unification and Generalization of Exact Distributed First-Order Methods , 2017, IEEE Transactions on Signal and Information Processing over Networks.

[8]  Kilian Q. Weinberger,et al.  Optimal Convergence Rates for Convex Distributed Optimization in Networks , 2019, J. Mach. Learn. Res..

[9]  Anit Kumar Sahu,et al.  Communication-Efficient Distributed Strongly Convex Stochastic Optimization: Non-Asymptotic Rates. , 2018, 1809.02920.

[10]  Anit Kumar Sahu,et al.  Distributed Zeroth Order Optimization Over Random Networks: A Kiefer-Wolfowitz Stochastic Approximation Approach , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[11]  Mingyi Hong,et al.  Zeroth Order Nonconvex Multi-Agent Optimization over Networks , 2017 .

[12]  Chaomin Luo,et al.  Multi-Agent formation control with target tracking and navigation , 2017, 2017 IEEE International Conference on Information and Automation (ICIA).

[13]  Ohad Shamir,et al.  An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback , 2015, J. Mach. Learn. Res..

[14]  Tamer Başar,et al.  Stochastic Subgradient Algorithms for Strongly Convex Optimization Over Distributed Networks , 2014, IEEE Transactions on Network Science and Engineering.

[15]  Na Li,et al.  Harnessing smoothness to accelerate distributed optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[16]  Vianney Perchet,et al.  Highly-Smooth Zero-th Order Online Optimization , 2016, COLT.

[17]  Hariharan Narayanan,et al.  Escaping the Local Minima via Simulated Annealing: Optimization of Approximately Convex Functions , 2015, COLT.

[18]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[19]  Sonia Martínez,et al.  Distributed convex optimization via continuous-time coordination algorithms with discrete-time communication , 2014, Autom..

[20]  Martin J. Wainwright,et al.  Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations , 2013, IEEE Transactions on Information Theory.

[21]  Alex Olshevsky,et al.  Linear Time Average Consensus on Fixed Graphs and Implications for Decentralized Optimization and Multi-Agent Control , 2014, 1411.4186.

[22]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[23]  Ohad Shamir,et al.  On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization , 2012, COLT.

[24]  Tim Kraska,et al.  MLbase: A Distributed Machine-learning System , 2013, CIDR.

[25]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[26]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[27]  Asuman E. Ozdaglar,et al.  Distributed multi-agent optimization with state-dependent communication , 2010, Math. Program..

[28]  Lin Xiao,et al.  Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[29]  Asuman E. Ozdaglar,et al.  Constrained Consensus and Optimization in Multi-Agent Networks , 2008, IEEE Transactions on Automatic Control.

[30]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[31]  J. Tsitsiklis,et al.  Convergence Speed in Distributed Consensus and Control , 2006 .

[32]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[33]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[34]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[35]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[36]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .