Personalized On-Device E-Health Analytics With Decentralized Block Coordinate Descent

Actuated by the growing attention to personal healthcare and the pandemic, the popularity of E-health is proliferating. Nowadays, enhancement on medical diagnosis via machine learning models has been highly effective in many aspects of e-health analytics. Nevertheless, in the classic cloud-based/centralized e-health paradigms, all the data will be centrally stored on the server to facilitate model training, which inevitably incurs privacy concerns and high time delay. Distributed solutions like Decentralized Stochastic Gradient Descent (D-SGD) are proposed to provide safe and timely diagnostic results based on personal devices. However, methods like D-SGD are subject to the gradient vanishing issue and usually proceed slowly at the early training stage, thereby impeding the effectiveness and efficiency of training. In addition, existing methods are prone to learning models that are biased towards users with dense data, compromising the fairness when providing E-health analytics for minority groups. In this paper, we propose a Decentralized Block Coordinate Descent (D-BCD) learning framework that can better optimize deep neural network-based models distributed on decentralized devices for E-health analytics. As a gradient-free optimization method, Block Coordinate Descent (BCD) mitigates the gradient vanishing issue and converges faster at the early stage compared with the conventional gradient-based optimization. To overcome the potential data scarcity issues for users local data, we propose similarity-based model aggregation that allows each on-device model to leverage knowledge from similar neighbor models, so as to achieve both personalization and high accuracy for the learned models. Benchmarking experiments on three real-world datasets illustrate the effectiveness and practicality of our proposed DBCD, where additional simulation study showcases the strong applicability of D-BCD in real-life E-health scenarios.

[1]  Wotao Yin,et al.  A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update , 2014, J. Sci. Comput..

[2]  Zheng Xu,et al.  Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.

[3]  Tong Chen,et al.  Fast-adapting and Privacy-preserving Federated Recommender System , 2021, The VLDB Journal.

[4]  Laurent Massoulié,et al.  Optimal Algorithms for Non-Smooth Distributed Optimization in Networks , 2018, NeurIPS.

[5]  Yang Wang,et al.  Learning Elastic Embeddings for Customizing On-Device Recommenders , 2021, KDD.

[6]  Spyros Lalis,et al.  IPLS: A Framework for Decentralized Federated Learning , 2021, 2021 IFIP Networking Conference (IFIP Networking).

[7]  Jakub Konecný,et al.  Federated Optimization: Distributed Optimization Beyond the Datacenter , 2015, ArXiv.

[8]  Michael Lawrence Barnett,et al.  Trends in Outpatient Care Delivery and Telemedicine During the COVID-19 Pandemic in the US. , 2020, JAMA internal medicine.

[9]  G. Moody,et al.  Development of the polysomnographic database on CD‐ROM , 1999, Psychiatry and clinical neurosciences.

[10]  Michael G. Rabbat,et al.  Multi-agent mirror descent for decentralized stochastic optimization , 2015, 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[11]  Martin Jaggi,et al.  Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.

[12]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[13]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[14]  A. Allen,et al.  Telemedicine technology and clinical applications. , 1995, JAMA.

[15]  D. Dijk,et al.  Mathematical Models for Sleep-Wake Dynamics: Comparison of the Two-Process Model and a Mutual Inhibition Neuronal Model , 2013, PloS one.

[16]  Laurent El Ghaoui,et al.  Lifted Neural Networks , 2018, ArXiv.

[17]  M S Mourtazaev,et al.  Age and gender affect different characteristics of slow waves in the sleep EEG. , 1995, Sleep.

[18]  E. Wolpert A Manual of Standardized Terminology, Techniques and Scoring System for Sleep Stages of Human Subjects. , 1969 .

[19]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[20]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[21]  Dan Alistarh,et al.  Distributed Learning over Unreliable Networks , 2018, ICML.

[22]  Yuan Yao,et al.  Global Convergence of Block Coordinate Descent in Deep Learning , 2018, ICML.

[23]  Rachid Guerraoui,et al.  Personalized and Private Peer-to-Peer Machine Learning , 2017, AISTATS.

[24]  Venkatesh Saligrama,et al.  Efficient Training of Very Deep Neural Networks for Supervised Hashing , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[26]  Ghazaleh Beigi,et al.  Deep Reinforcement Learning-based Text Anonymization against Private-Attribute Inference , 2019, EMNLP.

[27]  Wenjie Cai,et al.  QRS Complex Detection Using Novel Deep Learning Neural Networks , 2020, IEEE Access.

[28]  Aryan Mokhtari,et al.  FedPAQ: A Communication-Efficient Federated Learning Method with Periodic Averaging and Quantization , 2019, AISTATS.

[29]  Zi Huang,et al.  Try This Instead: Personalized and Interpretable Substitute Recommendation , 2020, SIGIR.

[30]  Ziming Zhang,et al.  Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks , 2017, NIPS.

[31]  Laurent El Ghaoui,et al.  Fenchel Lifted Networks: A Lagrange Relaxation of Neural Network Training , 2018, AISTATS.

[32]  Stephen P. Boyd,et al.  Fast linear iterations for distributed averaging , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[33]  Thinh T. Doan,et al.  Fast Convergence Rates of Distributed Subgradient Methods With Adaptive Quantization , 2018, IEEE Transactions on Automatic Control.

[34]  Peter Richtárik,et al.  Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.

[35]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[36]  Zi Huang,et al.  Discrete Deep Learning for Fast Content-Aware Recommendation , 2018, WSDM.

[37]  Hongzhi Yin,et al.  FENet: A Frequency Extraction Network for Obstructive Sleep Apnea Detection , 2021, IEEE Journal of Biomedical and Health Informatics.

[38]  Ziming Zhang,et al.  On the Convergence of Block Coordinate Descent in Training DNNs with Tikhonov Regularization , 2017, NIPS 2017.

[39]  Wotao Yin,et al.  A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[40]  Hongzhi Yin,et al.  Disease Prediction via Graph Neural Networks , 2020, IEEE Journal of Biomedical and Health Informatics.

[41]  Angelia Nedic,et al.  A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks , 2018, 2020 Information Theory and Applications Workshop (ITA).

[42]  Zi Huang,et al.  Next Point-of-Interest Recommendation on Resource-Constrained Mobile Devices , 2020, WWW.

[43]  Yuan Yao,et al.  A Proximal Block Coordinate Descent Algorithm for Deep Neural Network Training , 2018, ICLR.

[44]  Michael G. Rabbat,et al.  Stochastic Gradient Push for Distributed Deep Learning , 2018, ICML.

[45]  Ohad Shamir,et al.  Distributed stochastic optimization and learning , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[46]  Marc Tommasi,et al.  Decentralized Collaborative Learning of Personalized Models over Networks , 2016, AISTATS.

[47]  Joseph Roland D. Espiritu,et al.  Health Consequences of Obstructive Sleep Apnea , 2021 .

[48]  Miguel Á. Carreira-Perpiñán,et al.  Distributed optimization of deeply nested systems , 2012, AISTATS.

[49]  Xiangliang Zhang,et al.  Graph Embedding for Recommendation against Attribute Inference Attacks , 2021, WWW.

[50]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[51]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[52]  Wei Xiang,et al.  Internet of Things for Smart Healthcare: Technologies, Challenges, and Opportunities , 2017, IEEE Access.