A system to build distributed multivariate models and manage disparate data sharing policies: implementation in the scalable national network for effectiveness research

Background Centralized and federated models for sharing data in research networks currently exist. To build multivariate data analysis for centralized networks, transfer of patient-level data to a central computation resource is necessary. The authors implemented distributed multivariate models for federated networks in which patient-level data is kept at each site and data exchange policies are managed in a study-centric manner. Objective The objective was to implement infrastructure that supports the functionality of some existing research networks (e.g., cohort discovery, workflow management, and estimation of multivariate analytic models on centralized data) while adding additional important new features, such as algorithms for distributed iterative multivariate models, a graphical interface for multivariate model specification, synchronous and asynchronous response to network queries, investigator-initiated studies, and study-based control of staff, protocols, and data sharing policies. Materials and Methods Based on the requirements gathered from statisticians, administrators, and investigators from multiple institutions, the authors developed infrastructure and tools to support multisite comparative effectiveness studies using web services for multivariate statistical estimation in the SCANNER federated network. Results The authors implemented massively parallel (map-reduce) computation methods and a new policy management system to enable each study initiated by network participants to define the ways in which data may be processed, managed, queried, and shared. The authors illustrated the use of these systems among institutions with highly different policies and operating under different state laws. Discussion and Conclusion Federated research networks need not limit distributed query functionality to count queries, cohort discovery, or independently estimated analytic models. Multivariate analyses can be efficiently and securely conducted without patient-level data transport, allowing institutions with strict local data storage requirements to participate in sophisticated analyses based on federated research networks.

[1]  Deborah H. Batson,et al.  Data model considerations for clinical effectiveness researchers. , 2012, Medical care.

[2]  Richard Platt,et al.  Launching PCORnet, a national patient-centered clinical research network , 2014, Journal of the American Medical Informatics Association : JAMIA.

[3]  Anjum Khurshid,et al.  Louisiana Clinical Data Research Network: establishing an infrastructure for efficient conduct of clinical research , 2014, J. Am. Medical Informatics Assoc..

[4]  Philip R. O. Payne,et al.  TRIAD: The Translational Research Informatics and Data Management Grid , 2011, Applied Clinical Informatics.

[5]  Jihoon Kim,et al.  iDASH: integrating data for analysis, anonymization, and sharing , 2012, J. Am. Medical Informatics Assoc..

[6]  Lucila Ohno-Machado,et al.  Data governance requirements for distributed clinical research networks: triangulating perspectives of diverse stakeholders , 2014, J. Am. Medical Informatics Assoc..

[7]  K. Buetow,et al.  Cancer Informatics Vision: caBIG™ , 2006, Cancer informatics.

[8]  M. Kahn,et al.  Data Quality Assessment for Comparative Effectiveness Research in Distributed Data Networks , 2013, Medical care.

[9]  Richard Platt,et al.  The U.S. Food and Drug Administration's Mini‐Sentinel program: status and direction , 2012, Pharmacoepidemiology and drug safety.

[10]  Lisa Herrinton,et al.  Near real‐time adverse drug reaction surveillance within population‐based health networks: methodology considerations for data accrual , 2013, Pharmacoepidemiology and drug safety.

[11]  Rachel Gold,et al.  The ADVANCE network: accelerating data value across a national community health center network , 2014, Journal of the American Medical Informatics Association : JAMIA.

[12]  Lucila Ohno-Machado,et al.  pSCANNER: patient-centered Scalable National Network for Effectiveness Research , 2014, J. Am. Medical Informatics Assoc..

[13]  Richard Platt,et al.  The U.S. Food and Drug Administration's Mini‐Sentinel Program , 2012 .

[14]  Niall M. Adams,et al.  A review of parallel processing for statistical computation , 1996, Stat. Comput..

[15]  Michael G. Kahn,et al.  Developing a data infrastructure for a learning health system: the PORTAL network , 2014, J. Am. Medical Informatics Assoc..

[16]  Xiaoqian Jiang,et al.  WebGLORE: a Web service for Grid LOgistic REgression , 2013, Bioinform..

[17]  Lucila Ohno-Machado,et al.  Development of a Privacy and Security Policy Framework for a Multistate Comparative Effectiveness Research Network , 2013, Medical care.

[18]  Arthur J. Davidson,et al.  Clinical research data warehouse governance for distributed research networks in the USA: a systematic review of the literature , 2014, J. Am. Medical Informatics Assoc..

[19]  Prakash M. Nadkarni,et al.  The Greater Plains Collaborative: a PCORnet Clinical Research Data Network , 2014, J. Am. Medical Informatics Assoc..

[20]  C Ohmann,et al.  Future Developments of Medical Informatics from the Viewpoint of Networked Clinical Research , 2009, Methods of Information in Medicine.

[21]  Michael J. Becich,et al.  PaTH: towards a learning health system in the Mid-Atlantic region , 2014, Journal of the American Medical Informatics Association : JAMIA.

[22]  Douglas MacFadden,et al.  SHRINE: Enabling Nationally Scalable Multi-Site Disease Studies , 2013, PloS one.

[23]  Blackford Middleton,et al.  The value from investments in health information technology at the U.S. Department of Veterans Affairs. , 2010, Health affairs.

[24]  Xiaobo Zhou,et al.  Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS): Architecture , 2014, J. Am. Medical Informatics Assoc..

[25]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..

[26]  Bruce R. Rosen,et al.  Enabling collaborative research using the Biomedical Informatics Research Network (BIRN) , 2011, J. Am. Medical Informatics Assoc..

[27]  David Levine,et al.  CAPriCORN: Chicago Area Patient-Centered Outcomes Research Network , 2014, J. Am. Medical Informatics Assoc..

[28]  Lisa Dahm,et al.  University of California Research eXchange (UCReX): A Federated Cohort Discovery System , 2012, 2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology.

[29]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[30]  Rainu Kaushal,et al.  Changing the research landscape: the New York City Clinical Data Research Network , 2014, J. Am. Medical Informatics Assoc..

[31]  Marko Niinimäki,et al.  Distributed Computing with RESTful Web Services , 2012, 2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing.

[32]  Michael Seid,et al.  PEDSnet: how a prototype pediatric learning health system is being expanded into a national network. , 2014, Health affairs.

[33]  R. Platt,et al.  Distributed Health Data Networks: A Practical and Preferred Approach to Multi-Institutional Evaluations of Comparative Effectiveness, Safety, and Quality of Care , 2010, Medical care.

[34]  Brian Sauer,et al.  Guidelines for good database selection and use in pharmacoepidemiology research , 2012, Pharmacoepidemiology and drug safety.

[35]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[36]  Kathleen A. Johnson,et al.  Diabetes: The Impact of Clinical Pharmacy Services Integrated into Medical Homes on Diabetes-Related Clinical Outcomes , 2010, The Annals of pharmacotherapy.

[37]  Lucila Ohno-Machado,et al.  To Share or Not To Share: That Is Not the Question , 2012, Science Translational Medicine.

[38]  E. J. Wegman,et al.  Parallelizing Multiple Linear Regression for Speed and Redundancy: An Empirical Study , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[39]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[40]  Richard Platt,et al.  Rapid assessment of cardiovascular risk among users of smoking cessation drugs within the US Food and Drug Administration's Mini-Sentinel program. , 2013, JAMA internal medicine.

[41]  Vitaly Feldman,et al.  A Complete Characterization of Statistical Query Learning with Applications to Evolvability , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[42]  Daniella Meeker,et al.  Nudging guideline-concordant antibiotic prescribing: a randomized clinical trial. , 2014, JAMA internal medicine.

[43]  Bogdan Oancea,et al.  Integrating R and Hadoop for Big Data Analysis , 2014, ArXiv.

[44]  Jihoon Kim,et al.  Grid Binary LOgistic REgression (GLORE): building shared models without sharing data , 2012, J. Am. Medical Informatics Assoc..

[45]  Sharon F. Terry,et al.  Power to the People: Participant Ownership of Clinical Trial Data , 2011, Science Translational Medicine.

[46]  Lisa M. Schilling,et al.  Scalable Architecture for Federated Translational Inquiries Network (SAFTINet) Technology Infrastructure for a Distributed Data Network , 2013, EGEMS.