Reliability and Performance Assessment of Federated Learning on Clinical Benchmark Data

As deep learning have been applied in a clinical context, privacy concerns have increased because of the collection and processing of a large amount of personal data. Recently, federated learning (FL) has been suggested to protect personal privacy because it does not centralize data during the training phase. In this study, we assessed the reliability and performance of FL on benchmark datasets including MNIST and MIMIC-III. In addition, we attempted to verify FL on datasets that simulated a realistic clinical data distribution. We implemented FL that uses a client and server architecture and tested client and server FL on modified MNIST and MIMIC-III datasets. FL delivered reliable performance on both imbalanced and extremely skewed distributions (i.e., the difference of the number of patients and the characteristics of patients in each hospital). Therefore, FL can be suitable to protect privacy when applied to medical data.

[1]  Hubert Eichner,et al.  APPLIED FEDERATED LEARNING: IMPROVING GOOGLE KEYBOARD QUERY SUGGESTIONS , 2018, ArXiv.

[2]  Yuanyuan Liu,et al.  Fast Stochastic Variance Reduced Gradient Method with Momentum Acceleration for Machine Learning , 2017, ArXiv.

[3]  Andreas Gerstlauer,et al.  System Design , 2012 .

[4]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[5]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[6]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[7]  G.,et al.  Ensemble Methods in Machine , 2007 .

[8]  K. A. Varunkumar,et al.  Various Database Attacks and its Prevention Techniques , 2014 .

[9]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[10]  Carlo Luschi,et al.  Revisiting Small Batch Training for Deep Neural Networks , 2018, ArXiv.

[11]  Aram Galstyan,et al.  Multitask learning and benchmarking with clinical time series data , 2017, Scientific Data.

[12]  Zaïd Harchaoui,et al.  Robust Aggregation for Federated Learning , 2019, IEEE Transactions on Signal Processing.

[13]  B. Fitzgerald Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule , 2015 .

[14]  Jim Waldo,et al.  On system design , 2006, OOPSLA '06.

[15]  Alexander J. Smola,et al.  Efficient mini-batch training for stochastic optimization , 2014, KDD.

[16]  Laura A. Levit,et al.  Beyond the HIPAA Privacy Rule: Enhancing Privacy, Improving Health Through Research. Washington, DC: National Academies Press , 2009 .

[17]  Tom Ouyang,et al.  Federated Learning Of Out-Of-Vocabulary Words , 2019, ArXiv.

[18]  Daniel Rueckert,et al.  A generic framework for privacy preserving deep learning , 2018, ArXiv.

[19]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[20]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[21]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[22]  K. Emam,et al.  Evaluating the Risk of Re-identification of Patients from Hospital Prescription Records. , 2009, The Canadian journal of hospital pharmacy.

[23]  Hao Deng,et al.  LoAdaBoost: Loss-Based AdaBoost Federated Machine Learning on medical Data , 2018, ArXiv.

[24]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[25]  Yushi Wang,et al.  CO-OP: Cooperative Machine Learning from Mobile Devices , 2017 .