EXpectation Propagation LOgistic REgRession on permissioned blockCHAIN (ExplorerChain): decentralized online healthcare/genomics predictive model learning

Abstract Objective Predicting patient outcomes using healthcare/genomics data is an increasingly popular/important area. However, some diseases are rare and require data from multiple institutions to construct generalizable models. To address institutional data protection policies, many distributed methods keep the data locally but rely on a central server for coordination, which introduces risks such as a single point of failure. We focus on providing an alternative based on a decentralized approach. We introduce the idea using blockchain technology for this purpose, with a brief description of its own potential advantages/disadvantages. Materials and Methods We explain how our proposed EXpectation Propagation LOgistic REgRession on Permissioned blockCHAIN (ExplorerChain) can achieve the same results when compared to a distributed model that uses a central server on 3 healthcare/genomic datasets, and what trade-offs need to be considered when using centralized/decentralized methods. We explain how the use of blockchain technology can help decrease some of the problems encountered in decentralized methods. Results We showed that the discrimination power of ExplorerChain can be statistically similar to its counterpart central server-based algorithm. While ExplorerChain inherited some benefits of blockchain, it had a small increased running time. Discussion ExplorerChain has the same prerequisites as a distributed model with a centralized server for coordination. In a manner similar to secure multi-party computation strategies, it assumes that participating institutions are honest, but “curious.” Conclusion When evaluated on relatively small datasets, results suggest that ExplorerChain, which combines artificial intelligence and blockchain technologies, performs as well as a central server-based method, and may avoid some risks at the cost of efficiency.

[1]  Xiaoqian Jiang,et al.  SHARE: system design and case studies for statistical health information release , 2013, J. Am. Medical Informatics Assoc..

[2]  Vitalik Buterin A NEXT GENERATION SMART CONTRACT & DECENTRALIZED APPLICATION PLATFORM , 2015 .

[3]  Lucila Ohno-Machado,et al.  A Predictive Model for Extended Postanesthesia Care Unit Length of Stay in Outpatient Surgeries , 2017, Anesthesia and analgesia.

[4]  Brian Neil Levine,et al.  Sybil-Resistant Mixing for Bitcoin , 2014, WPES.

[5]  Marko Vukolic,et al.  The Quest for Scalable Blockchain Fabric: Proof-of-Work vs. BFT Replication , 2015, iNetSeC.

[6]  Aggelos Kiayias,et al.  The Bitcoin Backbone Protocol: Analysis and Applications , 2015, EUROCRYPT.

[7]  Khaled El Emam,et al.  Model Formulation: Evaluating Predictors of Geographic Area Population Size Cut-offs to Manage Re-identification Risk , 2009, J. Am. Medical Informatics Assoc..

[8]  Hyeon-Eui Kim,et al.  Blockchain distributed ledger technologies for biomedical and health care applications , 2017, J. Am. Medical Informatics Assoc..

[9]  John R. Douceur,et al.  The Sybil Attack , 2002, IPTPS.

[10]  Daniel Davis Wood,et al.  ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER , 2014 .

[11]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[12]  Xiaoqian Jiang,et al.  Privacy Technology to Support Data Sharing for Comparative Effectiveness Research: A Systematic Review , 2013, Medical care.

[13]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[14]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[15]  Jihoon Kim,et al.  iDASH: integrating data for analysis, anonymization, and sharing , 2012, J. Am. Medical Informatics Assoc..

[16]  Howard Rockette,et al.  Statistical Evaluation of Diagnostic Performance: Topics in Roc Analysis , 2011 .

[17]  R. Harrison,et al.  Early diagnosis of acute myocardial infarction using clinical and electrocardiographic data at presentation: derivation and evaluation of logistic regression models. , 1996, European heart journal.

[18]  Prateek Saxena,et al.  SCP: A Computationally-Scalable Byzantine Consensus Protocol For Blockchains , 2015, IACR Cryptol. ePrint Arch..

[19]  Lucila Ohno-Machado,et al.  To Share or Not To Share: That Is Not the Question , 2012, Science Translational Medicine.

[20]  Elizabeth A November,et al.  Creating sustainable local health information exchanges: can barriers to stakeholder participation be overcome? , 2008, Research brief.

[21]  Deven McGraw,et al.  Building public trust in uses of Health Insurance Portability and Accountability Act de-identified data , 2013, J. Am. Medical Informatics Assoc..

[22]  Jacek M. Zurada,et al.  Efficiency and Scalability Methods for Computational Intellect , 2013 .

[23]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[24]  Laszlo T Vaszar,et al.  Privacy issues in personalized medicine. , 2003, Pharmacogenomics.

[25]  Stephen P. Boyd,et al.  Gossip algorithms: design, analysis and applications , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[26]  Xiaoqian Jiang,et al.  WebGLORE: a Web service for Grid LOgistic REgression , 2013, Bioinform..

[27]  Jun Hu,et al.  A secure protocol for protecting the identity of providers when disclosing data for disease surveillance , 2011, J. Am. Medical Informatics Assoc..

[28]  Jihoon Kim,et al.  Privacy-preserving model learning on a blockchain network-of-networks , 2020, J. Am. Medical Informatics Assoc..

[29]  Cesare Pautasso,et al.  The Blockchain as a Software Connector , 2016, 2016 13th Working IEEE/IFIP Conference on Software Architecture (WICSA).

[30]  M. Massagli,et al.  Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm , 2011, Nature Biotechnology.

[31]  Harpreet Kaur,et al.  Concurrency Control in Distributed Database System , 2013 .

[32]  Pan Li,et al.  When Machine Learning Meets Blockchain: A Decentralized, Privacy-preserving and Secure Design , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[33]  A. Navathe,et al.  Optimizing health information technology's role in enabling comparative effectiveness research. , 2010, The American journal of managed care.

[34]  F. Olleros,et al.  Research Handbook on Digital Transformations , 2016 .

[35]  Xiaoqian Jiang,et al.  Secure Multi-pArty Computation Grid LOgistic REgression (SMAC-GLORE) , 2016, BMC Medical Informatics and Decision Making.

[36]  MultiChain Private Blockchain — White Paper , 2022 .

[37]  Jihoon Kim,et al.  Grid Binary LOgistic REgression (GLORE): building shared models without sharing data , 2012, J. Am. Medical Informatics Assoc..

[38]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[39]  Fay Cobb Payton,et al.  Privacy of medical records: IT implications of HIPAA , 2000, CSOC.

[40]  Sue Dill Calloway,et al.  The New HIPAA Law on Privacy and Confidentiality , 2002, Nursing administration quarterly.

[41]  Rodney A Gabriel,et al.  Predicting patients requiring discharge to post-acute care facilities following primary total hip replacement: Does anesthesia type play a role? , 2018, Journal of clinical anesthesia.

[42]  M. Mainelli,et al.  Sharing Ledgers for Sharing Economies: An Exploration of Mutual Distributed Ledgers (Aka Blockchain Technology) , 2015 .

[43]  Bradley Malin,et al.  Evaluating re-identification risks with respect to the HIPAA privacy rule , 2010, J. Am. Medical Informatics Assoc..

[44]  Lucila Ohno-Machado,et al.  Development of a Privacy and Security Policy Framework for a Multistate Comparative Effectiveness Research Network , 2013, Medical care.

[45]  Tsung-Ting Kuo,et al.  Comparison of blockchain platforms: a systematic review and healthcare examples , 2019, J. Am. Medical Informatics Assoc..

[46]  Feng Yan,et al.  Distributed Autonomous Online Learning: Regrets and Intrinsic Privacy-Preserving Properties , 2010, IEEE Transactions on Knowledge and Data Engineering.

[47]  Joshua C. Denny,et al.  The disclosure of diagnosis codes can breach research participants' privacy , 2010, J. Am. Medical Informatics Assoc..

[48]  Devavrat Shah,et al.  Gossip Algorithms , 2009, Found. Trends Netw..

[49]  Xiaoqian Jiang,et al.  EXpectation Propagation LOgistic REgRession (EXPLORER): Distributed privacy-preserving online model learning , 2013, J. Biomed. Informatics.

[50]  Rodney A. Gabriel,et al.  Fair compute loads enabled by blockchain: sharing models by alternating client and server roles , 2019, J. Am. Medical Informatics Assoc..

[51]  Satoshi Nakamoto Bitcoin : A Peer-to-Peer Electronic Cash System , 2009 .

[52]  Joseph J. LaViola,et al.  Byzantine Consensus from Moderately-Hard Puzzles : A Model for Bitcoin , 2014 .

[53]  Lucila Ohno-Machado,et al.  ModelChain: Decentralized Privacy-Preserving Healthcare Predictive Modeling Framework on Private Blockchain Networks , 2018, ArXiv.

[54]  W. T. Smith,et al.  The American Joint Replacement Registry , 2012, Orthopedic nursing.

[55]  Lucila Ohno-Machado,et al.  The use of receiver operating characteristic curves in biomedical informatics , 2005, J. Biomed. Informatics.

[56]  Dragos Velicanu,et al.  A Decentralized Public Key Infrastructure with Identity Retention , 2014, IACR Cryptol. ePrint Arch..

[57]  Marc Pilkington,et al.  Blockchain Technology: Principles and Applications , 2015 .