Data Extraction and Management in Networks of Observational Health Care Databases for Scientific Research: A Comparison of EU-ADR, OMOP, Mini-Sentinel and MATRICE Strategies

Introduction: We see increased use of existing observational data in order to achieve fast and transparent production of empirical evidence in health care research. Multiple databases are often used to increase power, to assess rare exposures or outcomes, or to study diverse populations. For privacy and sociological reasons, original data on individual subjects can’t be shared, requiring a distributed network approach where data processing is performed prior to data sharing. Case Descriptions and Variation Among Sites: We created a conceptual framework distinguishing three steps in local data processing: (1) data reorganization into a data structure common across the network; (2) derivation of study variables not present in original data; and (3) application of study design to transform longitudinal data into aggregated data sets for statistical analysis. We applied this framework to four case studies to identify similarities and differences in the United States and Europe: Exploring and Understanding Adverse Drug Reactions by Integrative Mining of Clinical Records and Biomedical Knowledge (EU-ADR), Observational Medical Outcomes Partnership (OMOP), the Food and Drug Administration’s (FDA’s) Mini-Sentinel, and the Italian network—the Integration of Content Management Information on the Territory of Patients with Complex Diseases or with Chronic Conditions (MATRICE). Findings: National networks (OMOP, Mini-Sentinel, MATRICE) all adopted shared procedures for local data reorganization. The multinational EU-ADR network needed locally defined procedures to reorganize its heterogeneous data into a common structure. Derivation of new data elements was centrally defined in all networks but the procedure was not shared in EU-ADR. Application of study design was a common and shared procedure in all the case studies. Computer procedures were embodied in different programming languages, including SAS, R, SQL, Java, and C++. Conclusion: Using our conceptual framework we found several areas that would benefit from research to identify optimal standards for production of empirical knowledge from existing databases.an opportunity to advance evidence-based care management. In addition, formalized CM outcomes assessment methodologies will enable us to compare CM effectiveness across health delivery settings.

[1]  Marius Fieschi,et al.  Design and validation of an automated method to detect known adverse drug reactions in MEDLINE: a contribution from the EU-ADR project , 2013, J. Am. Medical Informatics Assoc..

[2]  M. Schuemie,et al.  Combining electronic healthcare databases in Europe to allow for large‐scale drug safety monitoring: the EU‐ADR Project , 2011, Pharmacoepidemiology and drug safety.

[3]  Rosa Gini,et al.  Need and disparities in primary care management of patients with diabetes , 2014, BMC Endocrine Disorders.

[4]  Martijn J. Schuemie,et al.  Replication of the OMOP Experiment in Europe: Evaluating Methods for Risk Identification in Electronic Health Record Databases , 2013, Drug Safety.

[5]  Sengwee Toh,et al.  Privacy-preserving Analytic Methods for Multisite Comparative Effectiveness and Patient-centered Outcomes Research , 2014, Medical care.

[6]  Jan Bonhoeffer,et al.  Guillain-Barré syndrome and adjuvanted pandemic influenza A (H1N1) 2009 vaccine: multinational case-control study in Europe , 2011, BMJ : British Medical Journal.

[7]  K. Popper,et al.  Conjectures and refutations;: The growth of scientific knowledge , 1972 .

[8]  W Katherine Yih,et al.  Success of program linking data sources to monitor H1N1 vaccine safety points to potential for even broader safety surveillance. , 2012, Health affairs.

[9]  Katherine I. Morley,et al.  Defining Disease Phenotypes Using National Linked Electronic Health Records: A Case Study of Atrial Fibrillation , 2014, PloS one.

[10]  Martijn Schuemie,et al.  Guillain-Barré Syndrome and Adjuvanted Pandemic Influenza A (H1N1) 2009 Vaccines: A Multinational Self-Controlled Case Series in Europe , 2014, PloS one.

[11]  Martijn J Schuemie,et al.  Identification of acute myocardial infarction from electronic healthcare records using different disease coding systems: a validation study in three European countries , 2013, BMJ Open.

[12]  William DuMouchel,et al.  Interpreting observational studies: why empirical calibration is needed to correct p-values , 2013, Statistics in medicine.

[13]  R. Carnahan,et al.  Mini‐Sentinel's systematic reviews of validated methods for identifying health outcomes using administrative and claims data: methods and lessons learned , 2012, Pharmacoepidemiology and drug safety.

[14]  Patrick B. Ryan,et al.  Validation of a common data model for active safety surveillance research , 2012, J. Am. Medical Informatics Assoc..

[15]  Martijn J. Schuemie,et al.  Can Italian Healthcare Administrative Databases Be Used to Compare Regions with Respect to Compliance with Standards of Care for Chronic Diseases? , 2014, PloS one.

[16]  David McManus,et al.  Validation of acute myocardial infarction in the Food and Drug Administration's Mini‐Sentinel program , 2013, Pharmacoepidemiology and drug safety.

[17]  K. Popper,et al.  Conjectures and refutations;: The growth of scientific knowledge , 1972 .

[18]  D. Madigan,et al.  A Comparison of the Empirical Performance of Methods for a Risk Identification System , 2013, Drug Safety.

[19]  Marius Fieschi,et al.  Harmonization process for the identification of medical events in eight European healthcare databases: the experience from the EU-ADR project , 2013, J. Am. Medical Informatics Assoc..

[20]  Miguel A Hernán,et al.  From "big epidemiology" to "colossal epidemiology": when all eggs are in one basket. , 2013, Epidemiology.

[21]  Azadeh Shoaibi,et al.  Mini‐Sentinel methods: framework for assessment of positive results from signal refinement , 2014, Pharmacoepidemiology and drug safety.

[22]  Rosa Gini,et al.  Prevalence of chronic diseases by immigrant status and disparities in chronic disease management in immigrants: a population-based cohort study, Valore Project , 2013, BMC Public Health.

[23]  Scott Boyer,et al.  Drug-Induced Acute Myocardial Infarction: Identifying ‘Prime Suspects’ from Electronic Healthcare Records-Based Surveillance System , 2013, PloS one.

[24]  Bruce H Fireman,et al.  Confounding Adjustment in Comparative Effectiveness Research Conducted Within Distributed Research Networks , 2013, Medical care.

[25]  Martijn J. Schuemie,et al.  Using Electronic Health Care Records for Drug Safety Signal Detection: A Comparative Evaluation of Statistical Methods , 2012, Medical care.

[26]  Brent I. Fox,et al.  How Well Do Various Health Outcome Definitions Identify Appropriate Cases in Observational Studies? , 2013, Drug Safety.

[27]  J. Lei,et al.  Combining multiple healthcare databases for postmarketing drug and vaccine safety surveillance: why and how? , 2014, Journal of internal medicine.

[28]  Martin Kulldorff,et al.  Minimizing signal detection time in postmarket sequential analysis: balancing positive predictive value and sensitivity , 2014, Pharmacoepidemiology and drug safety.

[29]  Sengwee Toh,et al.  Multivariable confounding adjustment in distributed data networks without sharing of patient‐level data , 2013, Pharmacoepidemiology and drug safety.

[30]  Julia Adler-Milstein,et al.  Benchmarking health IT among OECD countries: better data for better policy , 2014, J. Am. Medical Informatics Assoc..

[31]  Rosa Gini,et al.  Group versus single handed primary care: a performance evaluation of the care delivered to chronic patients by Italian GPs. , 2013, Health policy.

[32]  Sebastiaan Overeem,et al.  The incidence of narcolepsy in Europe: before, during, and after the influenza A(H1N1)pdm09 pandemic and vaccination campaigns. , 2013, Vaccine.

[33]  Marius Fieschi,et al.  Design and evaluation of a semantic approach for the homogeneous identification of events in eight patient databases: a contribution to the European EU-ADR project , 2010, MedInfo.

[34]  Rosa Gini,et al.  Validation study in four health-care databases: upper gastrointestinal bleeding misclassification affects precision but not magnitude of drug-related upper gastrointestinal bleeding risk. , 2014, Journal of clinical epidemiology.

[35]  Walter Cazzola Domain-Specific Languages in Few Steps - The Neverlang Approach , 2012, SC@TOOLS.

[36]  Richard Platt,et al.  Is size the next big thing in epidemiology? , 2013, Epidemiology.

[37]  R. Platt,et al.  Distributed Health Data Networks: A Practical and Preferred Approach to Multi-Institutional Evaluations of Comparative Effectiveness, Safety, and Quality of Care , 2010, Medical care.

[38]  Miguel A Hernán,et al.  With great data comes great responsibility: publishing comparative effectiveness research in epidemiology. , 2011, Epidemiology.

[39]  Martijn J Schuemie,et al.  Chronic disease prevalence from Italian administrative databases in the VALORE project: a validation through comparison of population estimates with general practice databases and national survey , 2013, BMC Public Health.

[40]  D. Madigan,et al.  Empirical assessment of methods for risk identification in healthcare data: results from the experiments of the Observational Medical Outcomes Partnership , 2012, Statistics in medicine.

[41]  Kevin Haynes,et al.  Electronic clinical laboratory test results data tables: lessons from Mini‐Sentinel , 2014, Pharmacoepidemiology and drug safety.

[42]  Andrea Calì,et al.  Data Integration under Integrity Constraints , 2004, CAiSE.

[43]  Edoardo Vacchi,et al.  Neverlang 2 - Componentised Language Development for the JVM , 2013, SC@STAF.

[44]  Patrick B. Ryan,et al.  Evaluation of alternative standardized terminologies for medical conditions within a network of observational healthcare databases , 2012, J. Biomed. Informatics.

[45]  Sengwee Toh,et al.  Design for validation of acute myocardial infarction cases in Mini‐Sentinel , 2012, Pharmacoepidemiology and drug safety.

[46]  Sebastian Schneeweiss,et al.  Using high‐dimensional propensity scores to automate confounding control in a distributed medical product safety surveillance system , 2012, Pharmacoepidemiology and drug safety.

[47]  Patrick B. Ryan,et al.  Managing Data Quality for a Drug Safety Surveillance System , 2013, Drug Safety.

[48]  William S Weintraub,et al.  Safety of non-steroidal anti-inflammatory drugs , 2017, European heart journal.

[49]  J. Overhage,et al.  Advancing the Science for Active Surveillance: Rationale and Design for the Observational Medical Outcomes Partnership , 2010, Annals of Internal Medicine.

[50]  Barbara Evans,et al.  A policy framework for public health uses of electronic health data , 2012, Pharmacoepidemiology and drug safety.

[51]  Scott Boyer,et al.  Correction: Automatic Filtering and Substantiation of Drug Safety Signals , 2012, PLoS Computational Biology.

[52]  Martijn J Schuemie,et al.  EU-ADR healthcare database network vs. spontaneous reporting system database: preliminary comparison of signal detection. , 2011, Studies in health technology and informatics.

[53]  Rosa Gini,et al.  Systematic Age-Related Differences in Chronic Disease Management in a Population-Based Cohort Study: A New Paradigm of Primary Care Is Required , 2014, PloS one.

[54]  Patrick B. Ryan,et al.  Alternative Outcome Definitions and Their Effect on the Performance of Methods for Observational Outcome Studies , 2013, Drug Safety.

[55]  Marius Fieschi,et al.  A Semantic Approach for the Homogeneous Identification of Events in Eight Patient Databases: A Contribution to the European eu-ADR Project , 2009, MIE.

[56]  Martijn J Schuemie,et al.  Population-based analysis of non-steroidal anti-inflammatory drug use among children in four European countries in the SOS project: what size of data platforms and which study designs do we need to assess safety issues? , 2013, BMC Pediatrics.

[57]  Marsha A. Raebel,et al.  Adherence to Guidelines for Glucose Assessment in Starting Second-Generation Antipsychotics , 2014, Pediatrics.

[58]  M. Kahn,et al.  Data Quality Assessment for Comparative Effectiveness Research in Distributed Data Networks , 2013, Medical care.