Asynchronous Saddle Point Algorithm for Stochastic Optimization in Heterogeneous Networks

We consider expected risk minimization in multi-agent systems comprised of distinct subsets of agents operating without a common time scale. Each individual in the network is charged with minimizing the global objective function, which is an average of sum of the statistical average loss function of each agent in the network. Since agents are not assumed to observe data from identical distributions, the hypothesis that all agents seek a common action is violated, and thus the hypothesis upon that consensus constraints are formulated is violated. Thus, we consider nonlinear network proximity constraints, which incentivize nearby nodes to make decisions that are close to one another but do not necessarily coincide. Moreover, agents are not assumed to receive their sequentially arriving observations on a common time index, and thus seek to learn in an asynchronous manner. An asynchronous stochastic variant of the Arrow–Hurwicz saddle point method is proposed to solve this problem that operates by alternating primal stochastic descent steps and Lagrange multiplier updates that penalize the discrepancies between agents. This tool leads to an implementation that allows for each agent to operate asynchronously with local information only and message passing with neighbors. Our main result establishes that the proposed method yields convergence in expectation both in terms of the primal sub-optimality and constraint violation to radii of sizes ${\mathcal O}(\sqrt{T})$ and ${\mathcal O}(T^{3/4})$, respectively. Empirical evaluation on an asynchronously operating wireless network that manages user channel interference through an adaptive communications pricing mechanism demonstrates that our theoretical results translates well to practice.

[1]  M.G. Rabbat,et al.  Generalized consensus computation in networked systems with erasure links , 2005, IEEE 6th Workshop on Signal Processing Advances in Wireless Communications, 2005..

[2]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[3]  James T. Kwok,et al.  Asynchronous Distributed ADMM for Consensus Optimization , 2014, ICML.

[4]  Simon Haykin,et al.  Cognitive radio: brain-empowered wireless communications , 2005, IEEE Journal on Selected Areas in Communications.

[5]  Jie Zhang,et al.  OFDMA femtocells: A roadmap on interference avoidance , 2009, IEEE Communications Magazine.

[6]  Vivek S. Borkar,et al.  Distributed Asynchronous Incremental Subgradient Methods , 2001 .

[7]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[8]  Xiangfeng Wang,et al.  Asynchronous Distributed ADMM for Large-Scale Optimization—Part I: Algorithm and Convergence Analysis , 2015, IEEE Transactions on Signal Processing.

[9]  Daniel Pérez Palomar,et al.  A tutorial on decomposition methods for network utility maximization , 2006, IEEE Journal on Selected Areas in Communications.

[10]  Aryan Mokhtari,et al.  Decentralized quadratically approximated alternating direction method of multipliers , 2015, 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[11]  John C. Duchi,et al.  Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[12]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[13]  Mung Chiang,et al.  Cross-Layer Congestion Control, Routing and Scheduling Design in Ad Hoc Wireless Networks , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[14]  Ali H. Sayed,et al.  Asynchronous adaptive networks , 2015, ArXiv.

[15]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[16]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[17]  H. Robbins A Stochastic Approximation Method , 1951 .

[18]  Jie Chen,et al.  Multitask Diffusion Adaptation Over Networks , 2013, IEEE Transactions on Signal Processing.

[19]  Halim Yanikomeroglu,et al.  Interference-Aware Energy-Efficient Resource Allocation for OFDMA-Based Heterogeneous Networks With Incomplete Channel State Information , 2015, IEEE Transactions on Vehicular Technology.

[20]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[21]  Alejandro Ribeiro,et al.  D4L: Decentralized dynamic discriminative dictionary learning , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Rong Jin,et al.  Trading regret for efficiency: online convex optimization with long term constraints , 2011, J. Mach. Learn. Res..

[23]  Francis R. Bach,et al.  Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression , 2013, J. Mach. Learn. Res..

[24]  Alejandro Ribeiro,et al.  A Saddle Point Algorithm for Networked Online Convex Optimization , 2014, IEEE Transactions on Signal Processing.

[25]  Ketan Rajawat,et al.  Asynchronous Saddle Point Method: Interference Management Through Pricing , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[26]  Qing Ling,et al.  DLM: Decentralized Linearized Alternating Direction Method of Multipliers , 2015, IEEE Transactions on Signal Processing.

[27]  Qing Ling,et al.  On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..

[28]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[29]  Asuman E. Ozdaglar,et al.  Distributed Alternating Direction Method of Multipliers , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[30]  John N. Tsitsiklis,et al.  Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[31]  Angelia Nedic,et al.  Distributed Asynchronous Constrained Stochastic Optimization , 2011, IEEE Journal of Selected Topics in Signal Processing.

[32]  Angelia Nedic,et al.  Subgradient Methods for Saddle-Point Problems , 2009, J. Optimization Theory and Applications.

[33]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[34]  B. V. Dean,et al.  Studies in Linear and Non-Linear Programming. , 1959 .

[35]  Konstantinos I. Tsianos,et al.  Asynchronous decentralized optimization in heterogeneous systems , 2014, 53rd IEEE Conference on Decision and Control.

[36]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[37]  Cédric Archambeau,et al.  Adaptive Algorithms for Online Convex Optimization with Long-term Constraints , 2015, ICML.

[38]  Michael G. Rabbat,et al.  Distributed dual averaging for convex optimization under communication delays , 2012, 2012 American Control Conference (ACC).

[39]  Brian M. Sadler,et al.  Proximity Without Consensus in Online Multiagent Optimization , 2016, IEEE Transactions on Signal Processing.

[40]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[41]  Alejandro Ribeiro,et al.  Online learning for characterizing unknown environments in ground robotic vehicle models , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[42]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[43]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[44]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[45]  Alejandro Ribeiro,et al.  D-MAP: Distributed Maximum a Posteriori Probability Estimation of Dynamic Systems , 2013, IEEE Transactions on Signal Processing.

[46]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[47]  Nenghai Yu,et al.  Asynchronous Stochastic Gradient Descent with Delay Compensation , 2016, ICML.

[48]  Asuman E. Ozdaglar,et al.  On the O(1=k) convergence of asynchronous distributed alternating Direction Method of Multipliers , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[49]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[50]  Ali H. Sayed,et al.  Stability and Performance Limits of Adaptive Primal-Dual Networks , 2014, IEEE Transactions on Signal Processing.

[51]  Michael J. Neely Asynchronous control for coupled Markov decision systems , 2012, 2012 IEEE Information Theory Workshop.

[52]  Ketan Rajawat,et al.  Asynchronous Incremental Stochastic Dual Descent Algorithm for Network Resource Allocation , 2017, IEEE Transactions on Signal Processing.