LoAdaBoost: Loss-based AdaBoost federated machine learning with reduced computational complexity on IID and non-IID intensive care data

Intensive care data are valuable for improvement of health care, policy making and many other purposes. Vast amount of such data are stored in different locations, on many different devices and in different data silos. Sharing data among different sources is a big challenge due to regulatory, operational and security reasons. One potential solution is federated machine learning, which is a method that sends machine learning algorithms simultaneously to all data sources, trains models in each source and aggregates the learned models. This strategy allows utilization of valuable data without moving them. One challenge in applying federated machine learning is the possibly different distributions of data from diverse sources. To tackle this problem, we proposed an adaptive boosting method named LoAdaBoost that increases the efficiency of federated machine learning. Using intensive care unit data from hospitals, we investigated the performance of learning in IID and non-IID data distribution scenarios, and showed that the proposed LoAdaBoost method achieved higher predictive accuracy with lower computational complexity than the baseline method.

[1]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[2]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.

[3]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[4]  Kenneth D. Mandl,et al.  FADL: Federated-Autonomous Deep Learning for Distributed Electronic Health Record , 2018, ArXiv.

[5]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[6]  Maria-Florina Balcan,et al.  Distributed Learning, Communication Complexity and Privacy , 2012, COLT.

[7]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[8]  Kenneth D. Mandl,et al.  Confederated Machine Learning on Horizontally and Vertically Separated Medical Data for Large-Scale Health System Intelligence , 2019, ArXiv.

[9]  Peter Richtárik,et al.  Fast distributed coordinate descent for non-strongly convex losses , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[10]  Ming Zheng,et al.  Artificial neural networks condensation: A strategy to facilitate adaption of machine learning in medical settings by reducing computational burden , 2018, ArXiv.

[11]  Sarvar Patel,et al.  Practical Secure Aggregation for Federated Learning on User-Held Data , 2016, ArXiv.

[12]  Ohad Shamir,et al.  Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[13]  Peter Richtárik,et al.  Distributed Coordinate Descent Method for Learning with Big Data , 2013, J. Mach. Learn. Res..

[14]  Jakub Konecný,et al.  Federated Optimization: Distributed Optimization Beyond the Datacenter , 2015, ArXiv.

[15]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[16]  Li Huang,et al.  Patient Clustering Improves Efficiency of Federated Machine Learning to predict mortality and hospital stay time using distributed Electronic Medical Records , 2019, J. Biomed. Informatics.

[17]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[18]  Daniel R. Rehak,et al.  A Model and Infrastructure for Federated Learning Content Repositories , 2005 .

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Dmitriy Dligach,et al.  Two-stage Federated Phenotyping and Patient Representation Learning , 2019, BioNLP@ACL.

[21]  Alistair E. W. Johnson,et al.  The eICU Collaborative Research Database, a freely available multi-center database for critical care research , 2018, Scientific Data.

[22]  João Carlos Gluz,et al.  Interdisciplinary Journal of E-learning and Learning Objects an Agent-based Federated Learning Object Search Service , 2022 .

[23]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.