Federated Multi-Task Learning

Federated learning poses new statistical and systems challenges in training machine learning models over distributed networks of devices. In this work, we show that multi-task learning is naturally suited to handle the statistical challenges of this setting, and propose a novel systems-aware optimization method, MOCHA, that is robust to practical systems issues. Our method and theory for the first time consider issues of high communication cost, stragglers, and fault tolerance for distributed multi-task learning. The resulting method achieves significant speedups compared to alternatives in the federated setting, as we demonstrate through simulations on real-world federated datasets.

[1]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[2]  Wei Hong,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tag: a Tiny Aggregation Service for Ad-hoc Sensor Networks , 2022 .

[3]  Yu Hen Hu,et al.  Vehicle classification in distributed sensor networks , 2004, J. Parallel Distributed Comput..

[4]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[5]  Wei Hong,et al.  TinyDB: an acquisitional query processing system for sensor networks , 2005, TODS.

[6]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[7]  Wei Hong,et al.  Model-based approximate querying in sensor networks , 2005, The VLDB Journal.

[8]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[9]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[10]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[11]  Kathrin Klamroth,et al.  Biconvex sets and optimization with biconvex functions: a survey and extensions , 2007, Math. Methods Oper. Res..

[12]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[13]  E. Xing,et al.  Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network , 2009, PLoS genetics.

[14]  Diane J. Cook,et al.  Keeping the Resident in the Loop: Adapting the Smart Home to the User , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[15]  Kees van Berkel,et al.  Multi-core for mobile phones , 2009, DATE.

[16]  Jukka K. Nurminen,et al.  Energy Efficiency of Mobile Clients in Cloud Computing , 2010, HotCloud.

[17]  Gernot Heiser,et al.  An Analysis of Power Consumption in a Smartphone , 2010, USENIX Annual Technical Conference.

[18]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[19]  van Ch Kees Berkel Multi-core for mobile phones (Keynote) , 2010 .

[20]  Nikolaos G. Bourbakis,et al.  A Survey on Wearable Sensor-Based Systems for Health Monitoring and Prognosis , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[21]  Jiayu Zhou,et al.  Clustered Multi-Task Learning Via Alternating Structure Optimization , 2011, NIPS.

[22]  Jiayu Zhou,et al.  Integrating low-rank and group-sparse structures for robust multi-task learning , 2011, KDD.

[23]  Pradeep Ravikumar,et al.  Sparse inverse covariance matrix estimation using quadratic approximation , 2011, MLSLP.

[24]  Ingrid Verbauwhede,et al.  The communication and computation cost of wireless security: extended abstract , 2011, WiSec '11.

[25]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[26]  Judy Kay,et al.  Challenges and Solutions of Ubiquitous User Modeling , 2012, Ubiquitous Display Environments.

[27]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[28]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[29]  Feng Qian,et al.  An in-depth study of LTE: effect of network protocol and application behavior on performance , 2013, SIGCOMM.

[30]  Pradeep Ravikumar,et al.  Large Scale Distributed Sparse Precision Estimation , 2013, NIPS.

[31]  Davide Anguita,et al.  A Public Domain Dataset for Human Activity Recognition using Smartphones , 2013, ESANN.

[32]  David Lillethun,et al.  Mobile fog: a programming model for large-scale applications on the internet of things , 2013, MCC '13.

[33]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[34]  Avleen Singh Bijral,et al.  Mini-Batch Primal and Dual Methods for SVMs , 2013, ICML.

[35]  Thomas Hofmann,et al.  Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.

[36]  Cordelia Schmid,et al.  Convolutional Kernel Networks , 2014, NIPS.

[37]  Xinhua Zhang,et al.  Convex Deep Learning via Normalized Kernels , 2014, NIPS.

[38]  Alexander J. Smola,et al.  Scalable hierarchical multitask learning algorithms for conversion optimization in display advertising , 2014, WSDM.

[39]  Shah Atiqur Rahman,et al.  Unintrusive eating recognition using Google Glass , 2015, 2015 9th International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth).

[40]  David Mateos-Núñez,et al.  Distributed optimization for multi-task learning via nuclear-norm approximation , 2015 .

[41]  Jakub Konecný,et al.  Federated Optimization: Distributed Optimization Beyond the Datacenter , 2015, ArXiv.

[42]  Fuzhen Zhuang,et al.  Collaborating between Local and Global Learning for Distributed Online Multiple Tasks , 2015, CIKM.

[43]  Michael I. Jordan,et al.  Adding vs. Averaging in Distributed Primal-Dual Optimization , 2015, ICML.

[44]  Teruo Higashino,et al.  Edge-centric Computing: Vision and Challenges , 2015, CCRV.

[45]  Mladen Kolar,et al.  Distributed Multi-Task Learning with Shared Representation , 2016, ArXiv.

[46]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[47]  Mladen Kolar,et al.  Distributed Multi-Task Learning , 2016, AISTATS.

[48]  David D. Cox,et al.  Tensor Switching Networks , 2016, NIPS.

[49]  Igor Carron,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[50]  Fernando José Von Zuben,et al.  Multi-task Sparse Structure Learning with Gaussian Copula Models , 2016, J. Mach. Learn. Res..

[51]  Jiayu Zhou,et al.  Asynchronous Multi-task Learning , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[52]  Sinno Jialin Pan,et al.  Distributed Multi-Task Relationship Learning , 2017, KDD.

[53]  Michael I. Jordan,et al.  CoCoA: A General Framework for Communication-Efficient Distributed Optimization , 2016, J. Mach. Learn. Res..

[54]  Ameet Talwalkar,et al.  Paleo: A Performance Model for Deep Neural Networks , 2016, ICLR.

[55]  Martin J. Wainwright,et al.  Convexified Convolutional Neural Networks , 2016, ICML.

[56]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[57]  Raja Lavanya,et al.  Fog Computing and Its Role in the Internet of Things , 2019, Advances in Computer and Electrical Engineering.