Fully Decentralized Joint Learning of Personalized Models and Collaboration Graphs

We consider the fully decentralized machine learning scenario where many users with personal datasets collaborate to learn models through local peer-to-peer exchanges , without a central coordinator. We propose to train personalized models that leverage a collaboration graph describing the relationships between the users' personal tasks, which we learn jointly with the models. Our fully decentralized optimization procedure alternates between training nonlinear models given the graph in a greedy boosting manner, and updating the collaboration graph (with controlled sparsity) given the models. Throughout the process, users exchange messages only with a small number of peers (their direct neighbors in the graph and a few random users), ensuring that the procedure naturally scales to large numbers of users. We analyze the convergence rate, memory and communication complexity of our approach, and demonstrate its benefits compared to competing techniques on synthetic and real datasets.

[1]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[2]  Yu Hen Hu,et al.  Vehicle classification in distributed sensor networks , 2004, J. Parallel Distributed Comput..

[3]  Mladen Kolar,et al.  Distributed Multitask Learning , 2015, ArXiv.

[4]  Wei Zhang,et al.  Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.

[5]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[6]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[7]  Hisashi Kashima,et al.  Distributed Multi-task Learning for Sensor Network , 2017, ECML/PKDD.

[8]  Frédéric Koriche,et al.  Compiling Combinatorial Prediction Games , 2018, ICML.

[9]  E Weinan,et al.  Functional Frank-Wolfe Boosting for General Loss Functions , 2015, ArXiv.

[10]  Mladen Kolar,et al.  Distributed Multi-Task Learning with Shared Representation , 2016, ArXiv.

[11]  Chunhua Shen,et al.  On the Dual Formulation of Boosting Algorithms , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Asuman E. Ozdaglar,et al.  Distributed Alternating Direction Method of Multipliers , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[13]  Eric Moulines,et al.  D-FW: Communication efficient distributed algorithms for high-dimensional sparse optimization , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[15]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[16]  Jiayu Zhou,et al.  Asynchronous Multi-task Learning , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[17]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[18]  Stéphan Clémençon,et al.  Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions , 2016, ICML.

[19]  Davide Anguita,et al.  A Public Domain Dataset for Human Activity Recognition using Smartphones , 2013, ESANN.

[20]  Rachid Guerraoui,et al.  Personalized and Private Peer-to-Peer Machine Learning , 2017, AISTATS.

[21]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[22]  Wei Zhang,et al.  Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[23]  S. Sathiya Keerthi,et al.  Semi-supervised multi-task learning of structured prediction models for web information extraction , 2011, CIKM '11.

[24]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[25]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[26]  Chinmay Hegde,et al.  Collaborative Deep Learning in Fixed Topology Networks , 2017, NIPS.

[27]  Anne-Marie Kermarrec,et al.  Converging Quickly to Independent Uniform Random Topologies , 2011, 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[28]  Pascal Frossard,et al.  Learning Laplacian Matrix in Smooth Graph Signal Representations , 2014, IEEE Transactions on Signal Processing.

[29]  Gerald Matz,et al.  Graph Learning Based on Total Variation Minimization , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[31]  Xiangru Lian,et al.  D2: Decentralized Training over Decentralized Data , 2018, ICML.

[32]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[33]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[34]  P. Tseng,et al.  Block-Coordinate Gradient Descent Method for Linearly Constrained Nonsmooth Separable Optimization , 2009 .

[35]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[36]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[37]  Yu Zhang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[38]  Marc Tommasi,et al.  Decentralized Collaborative Learning of Personalized Models over Networks , 2016, AISTATS.

[39]  Inês Almeida,et al.  DJAM: Distributed Jacobi Asynchronous Method for Learning Personal Models , 2018, IEEE Signal Processing Letters.

[40]  H. Goldstein Multilevel Modelling of Survey Data , 1991 .

[41]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[42]  Andreas Maurer,et al.  The Rademacher Complexity of Linear Transformation Classes , 2006, COLT.

[43]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[44]  Anne-Marie Kermarrec,et al.  Gossip-based peer sampling , 2007, TOCS.

[45]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[46]  Vassilis Kalofolias,et al.  How to Learn a Graph from Smooth Signals , 2016, AISTATS.