Proximal Multitask Learning Over Networks With Sparsity-Inducing Coregularization

In this work, we consider multitask learning problems where clusters of nodes are interested in estimating their own parameter vector. Cooperation among clusters is beneficial when the optimal models of adjacent clusters have a good number of similar entries. We propose a fully distributed algorithm for solving this problem. The approach relies on minimizing a global mean-square error criterion regularized by nondifferentiable terms to promote cooperation among neighboring clusters. A general diffusion forward-backward splitting strategy is introduced. Then, it is specialized to the case of sparsity promoting regularizers. A closed-form expression for the proximal operator of a weighted sum of ℓ1-norms is derived to achieve higher efficiency. We also provide conditions on the step-sizes that ensure convergence of the algorithm in the mean and mean-square error sense. Simulations are conducted to illustrate the effectiveness of the strategy.

[1]  Marc Moonen,et al.  Distributed Adaptive Node-Specific Signal Estimation in Fully Connected Sensor Networks—Part I: Sequential Node Updating , 2010, IEEE Transactions on Signal Processing.

[2]  Cédric Richard,et al.  Multitask Diffusion Adaptation Over Asynchronous Networks , 2014, IEEE Transactions on Signal Processing.

[3]  Ali H. Sayed,et al.  Sparse Distributed Learning Based on Diffusion Adaptation , 2012, IEEE Transactions on Signal Processing.

[4]  Zhaoyang Zhang,et al.  Diffusion Sparse Least-Mean Squares Over Networks , 2012, IEEE Transactions on Signal Processing.

[5]  Jie Chen,et al.  Diffusion LMS for clustered multitask networks , 2013, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Jie Chen,et al.  Group diffusion LMS , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Isao Yamada,et al.  A proximal splitting approach to regularized distributed adaptive estimation in diffusion networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Ali H. Sayed,et al.  Diffusion Strategies Outperform Consensus Strategies for Distributed Estimation Over Adaptive Networks , 2012, IEEE Transactions on Signal Processing.

[9]  Sergios Theodoridis,et al.  Online Sparse System Identification and Signal Reconstruction Using Projections Onto Weighted $\ell_{1}$ Balls , 2010, IEEE Transactions on Signal Processing.

[10]  Jie Chen,et al.  Multitask Diffusion Adaptation Over Networks , 2013, IEEE Transactions on Signal Processing.

[11]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[12]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[13]  Alfred O. Hero,et al.  Sparse LMS for system identification , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Massimo Fornasier,et al.  Compressive Sensing , 2015, Handbook of Mathematical Methods in Imaging.

[15]  Kostas Berberidis,et al.  Distributed Incremental-Based LMS for Node-Specific Adaptive Parameter Estimation , 2014, IEEE Transactions on Signal Processing.

[16]  Yuantao Gu,et al.  $l_{0}$ Norm Constraint LMS Algorithm for Sparse System Identification , 2009, IEEE Signal Processing Letters.

[17]  Soummya Kar,et al.  Gossip Algorithms for Distributed Signal Processing , 2010, Proceedings of the IEEE.

[18]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[19]  Alfred O. Hero,et al.  Diffusion LMS for multitask problems with overlapping hypothesis subspaces , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[20]  Ali H. Sayed,et al.  Adaptive Filters , 2008 .

[21]  Ali H. Sayed,et al.  Diffusion Adaptation over Networks , 2012, ArXiv.

[22]  Cédric Richard,et al.  Multitask diffusion LMS with sparsity-based regularization , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Soummya Kar,et al.  Convergence Rate Analysis of Distributed Gossip (Linear Parameter) Estimation: Fundamental Limits and Tradeoffs , 2010, IEEE Journal of Selected Topics in Signal Processing.

[24]  Marc Moonen,et al.  Unsupervised diffusion-based LMS for node-specific parameter estimation over wireless sensor networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[26]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[27]  Sergios Theodoridis,et al.  Sparsity-promoting adaptive algorithm for distributed learning in diffusion networks , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[28]  Paolo Di Lorenzo,et al.  Diffusion Adaptation Strategies for Distributed Estimation Over Gaussian Markov Random Fields , 2014, IEEE Transactions on Signal Processing.

[29]  Jie Chen,et al.  Diffusion LMS Over Multitask Networks , 2014, IEEE Transactions on Signal Processing.

[30]  Ali H. Sayed,et al.  Adaptive Networks , 2014, Proceedings of the IEEE.

[31]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[32]  Marc Moonen,et al.  Distributed Adaptive Estimation of Node-Specific Signals in Wireless Sensor Networks With a Tree Topology , 2011, IEEE Transactions on Signal Processing.

[33]  Ali H. Sayed,et al.  Diffusion strategies for adaptation and learning over networks: an examination of distributed strategies and network behavior , 2013, IEEE Signal Processing Magazine.

[34]  Jie Chen,et al.  Kernel LMS algorithm with forward-backward splitting for dictionary learning , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[35]  Ali H. Sayed,et al.  Distributed Pareto Optimization via Diffusion Strategies , 2012, IEEE Journal of Selected Topics in Signal Processing.

[36]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[37]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[38]  Ali H. Sayed,et al.  Incremental Adaptive Strategies Over Distributed Networks , 2007, IEEE Transactions on Signal Processing.

[39]  Kostas Berberidis,et al.  Distributed Diffusion-Based LMS for Node-Specific Adaptive Parameter Estimation , 2014, IEEE Transactions on Signal Processing.

[40]  Ali Sayed,et al.  Adaptation, Learning, and Optimization over Networks , 2014, Found. Trends Mach. Learn..

[41]  Ali H. Sayed,et al.  Distributed Clustering and Learning Over Networks , 2014, IEEE Transactions on Signal Processing.

[42]  P. L. Combettes,et al.  Proximity for sums of composite functions , 2010, 1007.3535.

[43]  Isao Yamada,et al.  A sparse adaptive filtering using time-varying soft-thresholding techniques , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[44]  Ali H. Sayed,et al.  Sparse diffusion LMS for distributed adaptive estimation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[45]  Ali H. Sayed,et al.  Clustering via diffusion adaptation over networks , 2012, 2012 3rd International Workshop on Cognitive Information Processing (CIP).

[46]  Ali H. Sayed,et al.  Proximal diffusion for stochastic costs with non-differentiable regularizers , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[47]  Angelia Nedic,et al.  Distributed Random Projection Algorithm for Convex Optimization , 2012, IEEE Journal of Selected Topics in Signal Processing.

[48]  Ali H. Sayed,et al.  Bio-Inspired Decentralized Radio Access Based on Swarming Mechanisms Over Adaptive Networks , 2013, IEEE Transactions on Signal Processing.

[49]  Dimitri P. Bertsekas,et al.  A New Class of Incremental Gradient Methods for Least Squares Problems , 1997, SIAM J. Optim..

[50]  Benoît Champagne,et al.  Estimation of Space-Time Varying Parameters Using a Diffusion LMS Algorithm , 2014, IEEE Transactions on Signal Processing.

[51]  Ali H. Sayed,et al.  Diffusion Adaptation Strategies for Distributed Optimization and Learning Over Networks , 2011, IEEE Transactions on Signal Processing.

[52]  Vahid Tarokh,et al.  An Adaptive Greedy Algorithm With Application to Nonlinear Communications , 2010, IEEE Transactions on Signal Processing.

[53]  Ali H. Sayed,et al.  Diffusion Least-Mean Squares Over Adaptive Networks: Formulation and Performance Analysis , 2008, IEEE Transactions on Signal Processing.

[54]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[55]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[56]  Dimitri P. Bertsekas,et al.  Incremental Subgradient Methods for Nondifferentiable Optimization , 2001, SIAM J. Optim..

[57]  Gang George Yin,et al.  Distributed Energy-Aware Diffusion Least Mean Squares: Game-Theoretic Learning , 2013, IEEE Journal of Selected Topics in Signal Processing.

[58]  Sergios Theodoridis,et al.  A greedy sparsity-promoting LMS for distributed adaptive learning in diffusion networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[59]  Ali H. Sayed,et al.  Decentralized clustering over adaptive networks , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[60]  Ali H. Sayed,et al.  Diffusion LMS Strategies for Distributed Estimation , 2010, IEEE Transactions on Signal Processing.