Empowering Pandemic Response with Federated Learning for Protein Sequence Data Analysis

Genomics sequencing has become more accessible thanks to advances in genomics technology. We have witnessed this, particularly in the COVID-19 pandemic with massive growth in viral (SARS-CoV-2) protein sequence data generation. Massive data, however, has been underutilized as a result of certain pandemic policies and countermeasures. For political or economic reasons, some wary countries have purposely created this issue, rather than being subject to the natural barriers to the availability of these data. All countries cannot be expected to actively contribute fair data to the scientific community; otherwise, their participation may become passive. We require a strategy to encourage nations across the globe to fairly exchange information on the pandemic situation and data. We propose a federated learning (FL)-based model to address the issue of data privacy and enable real-time surveillance of epidemics. In FL, we train a feed-forward neural network locally and transmit the updated weights to the central server to combine the learning from local data. In this way, a federated learning-based architecture offers data security because there is no need to share the data (the data does not leave the premises; just parameters or weights are communicated) and encourages all countries to contribute fairly to the research. While FL is popular in dealing with image data, in this paper we apply it to bioinformatics, in particular, protein sequence classification. The results show that the FL-based model not only performs better than the centralized, traditional deep learning models (such as CNN, GRU, LSTM, and feed-forward neural network) in terms of predictive accuracy and fairness in data utilization but that all of this can be accomplished while maintaining data privacy. Our findings imply that FL can address the problems of data availability and privacy without compromising results. Employing factual data and tracking the evolution and dissemination of new SARS-CoV-2 lineages in real-time, can aid in the disclosure of verifiable advancements in pandemic studies.

[1]  Z. Tayebi,et al.  ViralVectors: compact and scalable alignment-free virome feature generation , 2023, Medical & Biological Engineering & Computing.

[2]  Sarwan Ali,et al.  Exploring the Potential of GANs in Biological Sequence Analysis , 2023, Biology.

[3]  Z. Tayebi,et al.  Efficient Classification of SARS-CoV-2 Spike Sequences Using Federated Learning , 2023, ArXiv.

[4]  Sarwan Ali,et al.  Anderson Acceleration For Bioinformatics-Based Machine Learning , 2023, KDH@IJCAI.

[5]  G. D. Vedova,et al.  Reads2Vec: Efficient Embedding of Raw High-Throughput Sequencing Reads Data , 2022, J. Comput. Biol..

[6]  Sarwan Ali,et al.  Benchmarking machine learning robustness in Covid-19 genome sequence classification , 2022, Scientific Reports.

[7]  F. Conventi,et al.  Variant-driven early warning via unsupervised machine learning analysis of spike protein mutations for COVID-19 , 2022, Scientific Reports.

[8]  Nima Jafari Navimipour,et al.  A privacy-aware method for COVID-19 detection in chest CT images using lightweight deep conventional neural network and blockchain , 2022, Computers in Biology and Medicine.

[9]  B. Bhushan,et al.  Role of genomics in combating COVID-19 pandemic , 2022, Gene.

[10]  Xiaolan Gu,et al.  PRECAD: Privacy-Preserving and Robust Federated Learning via Crypto-Aided Differential Privacy , 2021, ArXiv.

[11]  Sarwan Ali,et al.  Efficient analysis of COVID-19 clinical data using machine learning models , 2021, Medical & Biological Engineering & Computing.

[12]  Sarwan Ali,et al.  Robust Representation and Efficient Feature Selection Allows for Effective Clustering of SARS-CoV-2 Variants , 2021, Algorithms.

[13]  Albert Y. Zomaya,et al.  Federated Learning for COVID-19 Detection With Generative Adversarial Networks in Edge Cloud Computing , 2021, IEEE Internet of Things Journal.

[14]  Mustafa Abdul Salam,et al.  COVID-19 detection using federated machine learning , 2021, PloS one.

[15]  Khan Muhammad,et al.  Federated learning for COVID-19 screening from Chest X-ray images , 2021, Applied Soft Computing.

[16]  T. Ward,et al.  Estimation of Continuous Blood Pressure from PPG via a Federated Learning Approach , 2021, Sensors.

[17]  Kashif Ahmad,et al.  Collaborative Federated Learning for Healthcare: Multi-Modal COVID-19 Diagnosis at the Edge , 2021, IEEE Open Journal of the Computer Society.

[18]  T. Minko,et al.  Recent Developments on Therapeutic and Diagnostic Approaches for COVID-19 , 2021, The AAPS journal.

[19]  Zhipeng Cai,et al.  Collaborative City Digital Twin For Covid-19 Pandemic: A Federated Learning Solution , 2020, ArXiv.

[20]  Li Li,et al.  A review of applications in federated learning , 2020, Comput. Ind. Eng..

[21]  Weishan Zhang,et al.  Dynamic-Fusion-Based Federated Learning for COVID-19 Detection , 2020, IEEE Internet of Things Journal.

[22]  I. Weber,et al.  Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone , 2020, Biochemical and Biophysical Research Communications.

[23]  Jing Wu,et al.  COVID-19 pandemic in China: Context, experience and lessons , 2020, Health Policy and Technology.

[24]  A. Kisa,et al.  Under‐reporting of COVID‐19 cases in Turkey , 2020, The International journal of health planning and management.

[25]  Geyong Min,et al.  Multi-Task Federated Learning for Personalised Deep Neural Networks in Edge Computing , 2020, IEEE Transactions on Parallel and Distributed Systems.

[26]  Yifan Yang,et al.  Experiments of Federated Learning for COVID-19 Chest X-ray Images , 2020, Advances in Artificial Intelligence and Security.

[27]  Tianjian Chen,et al.  A Secure Federated Transfer Learning Framework , 2020, IEEE Intelligent Systems.

[28]  B. Mccall COVID-19 and artificial intelligence: protecting health-care workers and curbing the spread. , 2020, The Lancet. Digital health.

[29]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[30]  Raymond K. W. Wong,et al.  Benefits of Jointly Training Autoencoders: An Improved Neural Tangent Kernel Analysis , 2019, IEEE Transactions on Information Theory.

[31]  Haifeng Wang,et al.  Predicting Protein-Protein Interactions from Matrix-Based Protein Sequence Using Convolution Neural Network and Feature-Selective Rotation Forest , 2019, Scientific Reports.

[32]  T. Davenport,et al.  The potential for artificial intelligence in healthcare , 2019, Future Healthcare Journal.

[33]  H. Brendan McMahan,et al.  Federated Heavy Hitters Discovery with Differential Privacy , 2019, AISTATS.

[34]  Yingshu Li,et al.  Data Linkage in Smart Internet of Things Systems: A Consideration from a Privacy Perspective , 2018, IEEE Communications Magazine.

[35]  Ivan Beschastnikh,et al.  Mitigating Sybils in Federated Learning Poisoning , 2018, ArXiv.

[36]  Kaiyang Li,et al.  Achieving differential privacy of genomic data releasing via belief propagation , 2018, Tsinghua Science and Technology.

[37]  Yingshu Li,et al.  Collective Data-Sanitization for Preventing Sensitive Information Inference Attacks in Social Networks , 2018, IEEE Transactions on Dependable and Secure Computing.

[38]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[39]  Sang Min Yoon,et al.  Human activity recognition from accelerometer data using Convolutional Neural Network , 2017, 2017 IEEE International Conference on Big Data and Smart Computing (BigComp).

[40]  I. Nookaew,et al.  Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer , 2017, Scientific Reports.

[41]  Jayoung Kim,et al.  Trends in Next-Generation Sequencing and a New Era for Whole Genome Sequencing , 2016, International neurourology journal.

[42]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[43]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[46]  Heiko Hoffmann,et al.  Kernel PCA for novelty detection , 2007, Pattern Recognit..

[47]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[48]  Sarwan Ali,et al.  Hashing2Vec: Fast Embedding Generation for SARS-CoV-2 Spike Sequence Classification , 2022, ACML.

[49]  Sarwan Ali,et al.  PSSM2Vec: A Compact Alignment-Free Embedding Approach for Coronavirus Spike Sequence Classification , 2022, ICONIP.

[50]  Wei-Lun Chao,et al.  On Bridging Generic and Personalized Federated Learning for Image Classification , 2022, ICLR.

[51]  Junaid Qadir,et al.  Active Learning Based Federated Learning for Waste and Natural Disaster Image Classification , 2020, IEEE Access.

[52]  Anupama Hoskoppa Sundaramurthy,et al.  Machine Learning and Artificial Intelligence , 2020, AI and Big Data’s Potential for Disruptive Innovation.

[53]  Otto Ritter,et al.  Characterizing Heterogeneous Molecular Biology Database Systems , 1995, J. Comput. Biol..