Regularized Diffusion Adaptation via Conjugate Smoothing

The purpose of this work is to develop and study a distributed strategy for Pareto optimization of an aggregate cost consisting of regularized risks. Each risk is modeled as the expectation of some loss function with unknown probability distribution while the regularizers are assumed deterministic, but are not required to be differentiable or even continuous. The individual, regularized, cost functions are distributed across a strongly-connected network of agents and the Pareto optimal solution is sought by appealing to a multi-agent diffusion strategy. To this end, the regularizers are smoothed by means of infimal convolution and it is shown that the Pareto solution of the approximate, smooth problem can be made arbitrarily close to the solution of the original, non-smooth problem. Performance bounds are established under conditions that are weaker than assumed before in the literature, and hence applicable to a broader class of adaptation and learning problems.

[1]  Ali H. Sayed,et al.  On the Learning Behavior of Adaptive Networks—Part I: Transient Analysis , 2013, IEEE Transactions on Information Theory.

[2]  Ali H. Sayed,et al.  Adaptive Networks , 2014, Proceedings of the IEEE.

[3]  Ali H. Sayed,et al.  Sparse diffusion LMS for distributed adaptive estimation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Qing Ling,et al.  On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..

[5]  Gesualdo Scutari,et al.  Distributed nonconvex optimization for sparse representation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Daniel Pérez Palomar,et al.  Distributed nonconvex multiagent optimization over time-varying networks , 2016, 2016 50th Asilomar Conference on Signals, Systems and Computers.

[7]  Ali Sayed,et al.  Adaptation, Learning, and Optimization over Networks , 2014, Found. Trends Mach. Learn..

[8]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[9]  Ali H. Sayed,et al.  Performance limits of stochastic sub-gradient learning, part II: Multi-agent case , 2017, Signal Process..

[10]  Yi Zhou,et al.  Communication-efficient algorithms for decentralized and stochastic optimization , 2017, Mathematical Programming.

[11]  Yaoliang Yu,et al.  Better Approximation and Faster Algorithm Using the Proximal Average , 2013, NIPS.

[12]  Gesualdo Scutari,et al.  NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[13]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[14]  Ali H. Sayed,et al.  Exact Diffusion for Distributed Optimization and Learning—Part I: Algorithm Development , 2017, IEEE Transactions on Signal Processing.

[15]  José M. F. Moura,et al.  Linear Convergence Rate of a Class of Distributed Augmented Lagrangian Algorithms , 2013, IEEE Transactions on Automatic Control.

[16]  Cédric Richard,et al.  Proximal Multitask Learning Over Networks With Sparsity-Inducing Coregularization , 2015, IEEE Transactions on Signal Processing.

[17]  Ali H. Sayed,et al.  Proximal diffusion for stochastic costs with non-differentiable regularizers , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[19]  Paolo Di Lorenzo,et al.  Diffusion Adaptation Strategies for Distributed Estimation Over Gaussian Markov Random Fields , 2014, IEEE Transactions on Signal Processing.

[20]  José M. F. Moura,et al.  Cooperative Convex Optimization in Networked Systems: Augmented Lagrangian Algorithms With Directed Gossip Communication , 2010, IEEE Transactions on Signal Processing.

[21]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[22]  S. Pillai,et al.  The Perron-Frobenius theorem: some of its applications , 2005, IEEE Signal Processing Magazine.

[23]  Ali H. Sayed,et al.  Sparse Distributed Learning Based on Diffusion Adaptation , 2012, IEEE Transactions on Signal Processing.

[24]  Zhaoyang Zhang,et al.  Diffusion Sparse Least-Mean Squares Over Networks , 2012, IEEE Transactions on Signal Processing.

[25]  Michael G. Rabbat,et al.  Distributed dual averaging for convex optimization under communication delays , 2012, 2012 American Control Conference (ACC).

[26]  Ali H. Sayed,et al.  Adaptive Penalty-Based Distributed Stochastic Convex Optimization , 2013, IEEE Transactions on Signal Processing.

[27]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[28]  Laurent Massoulié,et al.  Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.

[29]  Ali H. Sayed,et al.  Diffusion stochastic optimization with non-smooth regularizers , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[31]  Qing Ling,et al.  DLM: Decentralized Linearized Alternating Direction Method of Multipliers , 2015, IEEE Transactions on Signal Processing.

[32]  Usman A. Khan,et al.  A Linear Algorithm for Optimization Over Directed Graphs With Geometric Convergence , 2018, IEEE Control Systems Letters.

[33]  Asuman E. Ozdaglar,et al.  Constrained Consensus and Optimization in Multi-Agent Networks , 2008, IEEE Transactions on Automatic Control.

[34]  Laurent Massoulié,et al.  Optimal Algorithms for Non-Smooth Distributed Optimization in Networks , 2018, NeurIPS.

[35]  Heinz H. Bauschke,et al.  The Proximal Average: Basic Theory , 2008, SIAM J. Optim..

[36]  Thomas Hofmann,et al.  Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.

[37]  Michael Elad,et al.  Stable recovery of sparse overcomplete representations in the presence of noise , 2006, IEEE Transactions on Information Theory.

[38]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[39]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[40]  Asuman E. Ozdaglar,et al.  A fast distributed proximal-gradient method , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[41]  Qing Ling,et al.  A Proximal Gradient Algorithm for Decentralized Composite Optimization , 2015, IEEE Transactions on Signal Processing.

[42]  Chayne Planiden,et al.  Strongly Convex Functions, Moreau Envelopes, and the Generic Nature of Convex Functions with Strong Minimizers , 2015, SIAM J. Optim..

[43]  Ali H. Sayed,et al.  Diffusion Least-Mean Squares Over Adaptive Networks: Formulation and Performance Analysis , 2008, IEEE Transactions on Signal Processing.

[44]  E. Kreyszig Introductory Functional Analysis With Applications , 1978 .

[45]  Dimitri P. Bertsekas,et al.  A New Class of Incremental Gradient Methods for Least Squares Problems , 1997, SIAM J. Optim..

[46]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[47]  Martin J. Wainwright,et al.  Randomized Smoothing for Stochastic Optimization , 2011, SIAM J. Optim..

[48]  Dusan Jakovetic,et al.  A Unification and Generalization of Exact Distributed First-Order Methods , 2017, IEEE Transactions on Signal and Information Processing over Networks.

[49]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[50]  Ali H. Sayed,et al.  Distributed Pareto Optimization via Diffusion Strategies , 2012, IEEE Journal of Selected Topics in Signal Processing.

[51]  Marc Teboulle,et al.  Smoothing and First Order Methods: A Unified Framework , 2012, SIAM J. Optim..

[52]  Qing Ling,et al.  On the Linear Convergence of the ADMM in Decentralized Consensus Optimization , 2013, IEEE Transactions on Signal Processing.

[53]  Isao Yamada,et al.  A proximal splitting approach to regularized distributed adaptive estimation in diffusion networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[54]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[55]  Sergios Theodoridis,et al.  A Sparsity Promoting Adaptive Algorithm for Distributed Learning , 2012, IEEE Transactions on Signal Processing.