Annealing for Distributed Global Optimization

The paper proves convergence to global optima for a class of distributed algorithms for nonconvex optimization in network-based multi-agent settings. Agents are permitted to communicate over a time-varying undirected graph. Each agent is assumed to possess a local objective function (assumed to be smooth, but possibly nonconvex). The paper considers algorithms for optimizing the sum function. A distributed algorithm of the consensus + innovations type is proposed which relies on first-order information at the agent level. Under appropriate conditions on network connectivity and the cost objective, convergence to the set of global optima is achieved by an annealing-type approach, with decaying Gaussian noise independently added into each agent’s update step. It is shown that the proposed algorithm converges in probability to the set of global minima of the sum function.

[1]  Behrouz Touri,et al.  Non-Convex Distributed Optimization , 2015, IEEE Transactions on Automatic Control.

[2]  Raffaello D'Andrea,et al.  Path Planning for Unmanned Aerial Vehicles in Uncertain and Adversarial Environments , 2003 .

[3]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[4]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[5]  Bahman Gharesifard,et al.  Distributed Continuous-Time Convex Optimization on Weight-Balanced Digraphs , 2012, IEEE Transactions on Automatic Control.

[6]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[8]  Soummya Kar,et al.  Convergence Rate Analysis of Distributed Gossip (Linear Parameter) Estimation: Fundamental Limits and Tradeoffs , 2010, IEEE Journal of Selected Topics in Signal Processing.

[9]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[10]  Soummya Kar,et al.  Revisiting Normalized Gradient Descent: Fast Evasion of Saddle Points , 2017, IEEE Transactions on Automatic Control.

[11]  S. Mitter,et al.  Recursive stochastic algorithms for global optimization in R d , 1991 .

[12]  Carlo Fischione,et al.  On the Convergence of Alternating Direction Lagrangian Methods for Nonconvex Structured Optimization Problems , 2014, IEEE Transactions on Control of Network Systems.

[13]  Gesualdo Scutari,et al.  NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[16]  J. L. Maryak,et al.  P Global Random Optimization by Simultaneous Perturbation Stochastic Approximation , 2005 .

[17]  Matus Telgarsky,et al.  Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[18]  C. Hwang Laplace's Method Revisited: Weak Convergence of Probability Measures , 1980 .

[19]  Stephen J. Wright,et al.  A Distributed Quasi-Newton Algorithm for Empirical Risk Minimization with Nonsmooth Regularization , 2018, KDD.

[20]  Angelia Nedic,et al.  Distributed Optimization Over Time-Varying Directed Graphs , 2015, IEEE Trans. Autom. Control..

[21]  Sigrún Andradóttir,et al.  A review of simulation optimization techniques , 1998, 1998 Winter Simulation Conference. Proceedings (Cat. No.98CH36274).

[22]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[23]  Soummya Kar,et al.  Distributed Parameter Estimation in Sensor Networks: Nonlinear Observation Models and Imperfect Communication , 2008, IEEE Transactions on Information Theory.

[24]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[25]  H. Vincent Poor,et al.  Distributed Linear Parameter Estimation: Asymptotically Efficient Adaptive Strategies , 2011, SIAM J. Control. Optim..

[26]  Ali H. Sayed,et al.  Diffusion Adaptation Strategies for Distributed Optimization and Learning Over Networks , 2011, IEEE Transactions on Signal Processing.

[27]  V. Borkar Stochastic approximation with two time scales , 1997 .

[28]  Pascal Bianchi,et al.  Convergence of a Multi-Agent Projected Stochastic Gradient Algorithm for Non-Convex Optimization , 2011, IEEE Transactions on Automatic Control.

[29]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[30]  Gabriela Hug,et al.  A Case for Nonconvex Distributed Optimization in Large-Scale Power Systems , 2017, IEEE Transactions on Power Systems.

[31]  Robert Nowak,et al.  Distributed optimization in sensor networks , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[32]  Sonia Martínez,et al.  An Approximate Dual Subgradient Algorithm for Multi-Agent Non-Convex Optimization , 2010, IEEE Transactions on Automatic Control.

[33]  H. Kushner Asymptotic global behavior for stochastic approximation and diffusions with slowly decreasing noise effects: Global minimization via Monte Carlo , 1987 .

[34]  Kim-Chuan Toh,et al.  Semidefinite Programming Approaches for Sensor Network Localization With Noisy Distance Measurements , 2006, IEEE Transactions on Automation Science and Engineering.

[35]  H. Vincent Poor,et al.  Distributed Global Optimization by Annealing , 2019, 2019 IEEE 8th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[36]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.