Decentralized Non-Convex Learning with Linearly Coupled Constraints

Motivated by the need for decentralized learning, this paper aims at designing a distributed algorithm for solving nonconvex problems with general linear constraints over a multi-agent network. In the considered problem, each agent owns some local information and a local variable for jointly minimizing a cost function, but local variables are coupled by linear constraints. Most of the existing methods for such problems are only applicable for convex problems or problems with specific linear constraints. There still lacks a distributed algorithm for solving such problems with general linear constraints under the nonconvex setting. To tackle this problem, we propose a new algorithm, called proximal dual consensus (PDC) algorithm, which combines a proximal technique and a dual consensus method. We show that the proposed PDC algorithm can generate an -Karush-Kuhn-Tucker solution in O(1/ ) iterations, achieving the lower bound for non-convex problems. Numerical results are presented to demonstrate the good performance of the proposed algorithms for solving a regression problem and a neural network based classification problem over a multi-agent learning network.

[1]  I. Gijbels,et al.  Penalized likelihood regression for generalized linear models with non-quadratic penalties , 2011 .

[2]  Zhi-Quan Luo,et al.  A Proximal Alternating Direction Method of Multiplier for Linearly Constrained Nonconvex Minimization , 2018, SIAM J. Optim..

[3]  Yair Carmon,et al.  Lower bounds for finding stationary points I , 2017, Mathematical Programming.

[4]  Qing Ling,et al.  Decentralized learning for wireless communications and networking , 2015, ArXiv.

[5]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[6]  Xiangfeng Wang,et al.  Multi-Agent Distributed Optimization via Inexact Consensus ADMM , 2014, IEEE Transactions on Signal Processing.

[7]  Mehmet E. Yildiz,et al.  Coding With Side Information for Rate-Constrained Consensus , 2008, IEEE Transactions on Signal Processing.

[8]  Shu Liang,et al.  Distributed Smooth Convex Optimization With Coupled Constraints , 2020, IEEE Transactions on Automatic Control.

[9]  Angelia Nedic,et al.  Distributed Gradient Methods for Convex Machine Learning Problems in Networks: Distributed Optimization , 2020, IEEE Signal Processing Magazine.

[10]  Tsung-Hui Chang,et al.  Communication-Efficient Distributed Demand Response: A Randomized ADMM Approach , 2017, IEEE Transactions on Smart Grid.

[11]  Linglong Kong,et al.  Learning Privately over Distributed Features: An ADMM Sharing Approach , 2019, ArXiv.

[12]  Daniel Pérez Palomar,et al.  Alternative Distributed Algorithms for Network Utility Maximization: Framework and Applications , 2007, IEEE Transactions on Automatic Control.

[13]  Ali H. Sayed,et al.  Supervised Learning Under Distributed Features , 2018, IEEE Transactions on Signal Processing.

[14]  Lihua Xie,et al.  Distributed Continuous-Time Nonsmooth Convex Optimization With Coupled Inequality Constraints , 2020, IEEE Transactions on Control of Network Systems.

[15]  Mingyi Hong,et al.  Prox-PDA: The Proximal Primal-Dual Algorithm for Fast Distributed Nonconvex Optimization and Learning Over Networks , 2017, ICML.

[16]  Daniel Pérez Palomar,et al.  A tutorial on decomposition methods for network utility maximization , 2006, IEEE Journal on Selected Areas in Communications.

[17]  John Langford,et al.  Scaling up machine learning: parallel and distributed approaches , 2011, KDD '11 Tutorials.

[18]  Tsung-Hui Chang,et al.  A Proximal Dual Consensus ADMM Method for Multi-Agent Constrained Optimization , 2014, IEEE Transactions on Signal Processing.

[19]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[20]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[21]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[22]  Di Niu,et al.  FDML: A Collaborative Machine Learning Framework for Distributed Features , 2019, KDD.

[23]  Anna Scaglione,et al.  Distributed Constrained Optimization by Consensus-Based Primal-Dual Perturbation Method , 2013, IEEE Transactions on Automatic Control.

[24]  Shiqian Ma,et al.  A Block Successive Upper-Bound Minimization Method of Multipliers for Linearly Constrained Convex Optimization , 2014, Math. Oper. Res..

[25]  Wei Zhang,et al.  Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[26]  Giuseppe Notarstefano,et al.  A Duality-Based Approach for Distributed Min–Max Optimization , 2016, IEEE Transactions on Automatic Control.

[27]  Sonia Martínez,et al.  On Distributed Convex Optimization Under Inequality and Equality Constraints , 2010, IEEE Transactions on Automatic Control.

[28]  F. Facchinei,et al.  Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[29]  Ali H. Sayed,et al.  Dual Coupled Diffusion for Distributed Optimization with Affine Constraints , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[30]  Danna Zhou,et al.  d. , 1840, Microbial pathogenesis.

[31]  Tsung-Hui Chang,et al.  A Proximal Dual Consensus Method for Linearly Coupled Multi-Agent Non-Convex Optimization , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32]  Wotao Yin,et al.  Global Convergence of ADMM in Nonconvex Nonsmooth Optimization , 2015, Journal of Scientific Computing.

[33]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[34]  Jong-Shi Pang,et al.  A Posteriori Error Bounds for the Linearly-Constrained Variational Inequality Problem , 1987, Math. Oper. Res..

[35]  Henrik Sandberg,et al.  A Survey of Distributed Optimization and Control Algorithms for Electric Power Systems , 2017, IEEE Transactions on Smart Grid.

[36]  A. Banerjee Convex Analysis and Optimization , 2006 .

[37]  Mingyi Hong,et al.  Distributed Learning in the Nonconvex World: From batch data to streaming and beyond , 2020, IEEE Signal Processing Magazine.

[38]  Gesualdo Scutari,et al.  Parallel and Distributed Successive Convex Approximation Methods for Big-Data Optimization , 2018, ArXiv.

[39]  Tianjian Chen,et al.  Federated Machine Learning: Concept and Applications , 2019 .

[40]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[41]  W. Marsden I and J , 2012 .

[42]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[43]  Weiwei Kong,et al.  Complexity of a Quadratic Penalty Accelerated Inexact Proximal Point Method for Solving Linearly Constrained Nonconvex Composite Programs , 2018, SIAM J. Optim..

[44]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[45]  Chong-Yung Chi,et al.  Distributed Robust Multicell Coordinated Beamforming With Imperfect CSI: An ADMM Approach , 2011, IEEE Transactions on Signal Processing.