Maelstrom Research guidelines for rigorous retrospective data harmonization

Abstract Background: It is widely accepted and acknowledged that data harmonization is crucial: in its absence, the co-analysis of major tranches of high quality extant data is liable to inefficiency or error. However, despite its widespread practice, no formalized/systematic guidelines exist to ensure high quality retrospective data harmonization. Methods: To better understand real-world harmonization practices and facilitate development of formal guidelines, three interrelated initiatives were undertaken between 2006 and 2015. They included a phone survey with 34 major international research initiatives, a series of workshops with experts, and case studies applying the proposed guidelines. Results: A wide range of projects use retrospective harmonization to support their research activities but even when appropriate approaches are used, the terminologies, procedures, technologies and methods adopted vary markedly. The generic guidelines outlined in this article delineate the essentials required and describe an interdependent step-by-step approach to harmonization: 0) define the research question, objectives and protocol; 1) assemble pre-existing knowledge and select studies; 2) define targeted variables and evaluate harmonization potential; 3) process data; 4) estimate quality of the harmonized dataset(s) generated; and 5) disseminate and preserve final harmonization products. Conclusions: This manuscript provides guidelines aiming to encourage rigorous and effective approaches to harmonization which are comprehensively and transparently documented and straightforward to interpret and implement. This can be seen as a key step towards implementing guiding principles analogous to those that are well recognised as being essential in securing the foundational underpinning of systematic reviews and the meta-analysis of clinical trials.

[1]  Jim Todd,et al.  Effect of HIV infection on pregnancy-related mortality in sub-Saharan Africa: secondary analyses of pooled community-based data from the network for Analysing Longitudinal Population-based HIV/AIDS data on Africa (ALPHA) , 2013, The Lancet.

[2]  P. Shekelle,et al.  Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation , 2016, British Medical Journal.

[3]  Parminder Raina,et al.  Invited commentary: consolidating data harmonization--how to obtain quality and applicability? , 2011, American journal of epidemiology.

[4]  Alfredo Morabia,et al.  HARMONISING LOCAL HEALTH SURVEY DATA The EURALIM Experience , 2003 .

[5]  Anna Bergström,et al.  Joint Data Analyses Of European Birth Cohorts: Two Different Approaches , 2012 .

[6]  N E Day,et al.  The detection of gene-environment interaction for continuous traits: should we deal with measurement error by bigger studies or better measurement? , 2003, International journal of epidemiology.

[7]  Kaarin J Anstey,et al.  Cohort profile: The Dynamic Analyses to Optimize Ageing (DYNOPTA) project. , 2010, International journal of epidemiology.

[8]  Ellen Kampman,et al.  The Consortium on Health and Ageing: Network of Cohorts in Europe and the United States (CHANCES) project—design, population and data harmonization of a large-scale, international study , 2014, European Journal of Epidemiology.

[9]  Kaarin J Anstey,et al.  A simple measure with complex determinants: investigation of the correlates of self-rated health in older men and women from three continents , 2012, BMC Public Health.

[10]  Markus Perola,et al.  Data harmonization and federated analysis of population-based studies: the BioSHaRE project , 2013, Emerging Themes in Epidemiology.

[11]  Oliver Butters,et al.  DataSHIELD: taking the analysis to the data, not the data to the analysis , 2014, International journal of epidemiology.

[12]  Delyse M Hutchinson,et al.  Young adult sequelae of adolescent cannabis use: an integrative analysis. , 2014, The lancet. Psychiatry.

[13]  Margaret Dalziel,et al.  Impact of Government Investments in Research and Innovation: A Review of Academic Investigations , 2012 .

[14]  N E Day,et al.  European Prospective Investigation into Cancer and Nutrition (EPIC): study populations and data collection , 2002, Public Health Nutrition.

[15]  Hans Hillege,et al.  Quality, quantity and harmony: the DataSHaPER approach to integrating data across bioclinical studies , 2010, International journal of epidemiology.

[16]  John E J Gallacher,et al.  The case for large scale fungible cohorts. , 2007, European journal of public health.

[17]  Peter Kraft,et al.  Phenotype harmonization and cross‐study collaboration in GWAS consortia: the GENEVA experience , 2011, Genetic epidemiology.

[18]  J. Gallacher,et al.  Generating large-scale longitudinal data resources for aging research. , 2011, The journals of gerontology. Series B, Psychological sciences and social sciences.

[19]  Sabina Zambon,et al.  European Project on Osteoarthritis (EPOSA): methodological challenges in harmonization of existing data from five European population-based cohorts on aging , 2011, BMC musculoskeletal disorders.

[20]  Kaarin J Anstey,et al.  COSMIC (Cohort Studies of Memory in an International Consortium): An international consortium to identify risk and protective factors and biomarkers of cognitive ageing and dementia in diverse ethnic and sociocultural groups , 2013, Alzheimer's & Dementia.

[21]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. , 2010, International journal of surgery.

[22]  Margaret L. Kern,et al.  Integrating prospective longitudinal data: modeling personality and health in the Terman Life Cycle and Hawaii Longitudinal Studies. , 2014, Developmental psychology.

[23]  J. Franklyn,et al.  Subclinical hypothyroidism and the risk of coronary heart disease and mortality. , 2010, JAMA.

[24]  Cheuk-Man Yu,et al.  Individual patient meta-analyses of restrictive diastolic filling pattern and mortality in patients post acute myocardial infarction and in patients with chronic heart failure. , 2007, International journal of cardiology.

[25]  Philippe Van de Perre,et al.  International multicentre pooled analysis of late postnatal mother-to-child transmission of HIV-1 infection , 1998, The Lancet.

[26]  Peter A. Bath,et al.  The harmonisation of longitudinal data: a case study using data from cohort studies in The Netherlands and the United Kingdom , 2010, Ageing and Society.

[27]  I. Olkin,et al.  Meta-analysis of observational studies in epidemiology - A proposal for reporting , 2000 .

[28]  R. Sinha,et al.  Body Mass Index and Diabetes in Asia: A Cross-Sectional Pooled Analysis of 900,000 Individuals in the Asia Cohort Consortium , 2011, PloS one.

[29]  Vincent Ferretti,et al.  Is rigorous retrospective harmonization possible? Application of the DataSHaPER approach across 53 large studies. , 2011, International journal of epidemiology.

[30]  klaguia Making an Impact: A Preferred Framework and Indicators to Measure Returns on Investment in Health Research , 2009 .

[31]  I. Pigeot,et al.  Prevalence and determinants of childhood overweight and obesity in European countries: pooled analysis of the existing surveys within the IDEFICS Consortium , 2009, International Journal of Obesity.

[32]  Sangita Kulathinal,et al.  MORGAM (an international pooling of cardiovascular cohorts). , 2004, International journal of epidemiology.

[33]  Lu Gao,et al.  Carotid intima-media thickness progression to predict cardiovascular events in the general population (the PROG-IMT collaborative project): a meta-analysis of individual participant data , 2012, The Lancet.

[34]  Amanda L Baker,et al.  Integrating and extending cohort studies: lessons from the eXtending Treatments, Education and Networks in Depression (xTEND) study , 2013, BMC Medical Research Methodology.

[35]  Ian J. Deary,et al.  Age and Gender Differences in Physical Capability Levels from Mid-Life Onwards: The Harmonisation and Meta-Analysis of Data from Eight UK Cohort Studies , 2011, PloS one.

[36]  Mohit Bhandari,et al.  Principles of evidence-based medicine. , 2010, The Orthopedic clinics of North America.

[37]  Leena Peltonen,et al.  The federated database – a basis for biobank-based post-genome studies, integrating phenome and genome data from 600 000 twin pairs in Europe , 2007, European Journal of Human Genetics.

[38]  H Tunstall-Pedoe,et al.  The Emerging Risk Factors Collaboration: analysis of individual data on lipid, inflammatory and other markers in over 1.1 million participants in 104 prospective studies of cardiovascular diseases , 2007, European Journal of Epidemiology.

[39]  Elsi,et al.  Biobanking and Biomolecular Resources Research Infrastructure , 2015 .

[40]  Christian Gieger,et al.  The Role of Adiposity in Cardiometabolic Traits: A Mendelian Randomization Analysis , 2013, BDJ.

[41]  Robert N. Doughty,et al.  The survival of patients with heart failure with preserved or reduced left ventricular ejection fraction: an individual patient data meta-analysis. , 2011, European heart journal.

[42]  Vittorio Krogh,et al.  Methods for pooling results of epidemiologic studies: the Pooling Project of Prospective Studies of Diet and Cancer. , 2006, American journal of epidemiology.

[43]  M. Tobin,et al.  DataSHIELD: resolving a conflict in contemporary bioscience—performing a pooled analysis of individual-level data without sharing the data , 2010, International journal of epidemiology.

[44]  Bert Brunekreef,et al.  Environmental exposure assessment in European birth cohorts: results from the ENRIECO project , 2013, Environmental Health.

[45]  Amal Mudallali Statement , 1988, Definitions.

[46]  Edwin R. van den Heuvel,et al.  Harmonization of Cognitive Measures in Individual Participant Data and Aggregate Data Meta-Analysis , 2013 .

[47]  Delyse M Hutchinson,et al.  How can data harmonisation benefit mental health research? An example of The Cannabis Cohorts Research Consortium , 2015, The Australian and New Zealand journal of psychiatry.

[48]  John D Potter,et al.  Toward Rigorous Data Harmonization in Cancer Epidemiology Research: One Approach. , 2015, American journal of epidemiology.

[49]  T. Watts,et al.  Alcohol drinking in never users of tobacco, cigarette smoking in never drinkers and the risk of head and neck cancer: pooled analysis in the international head and neck cancer epidemiology consortium , 2007, BDJ.

[50]  Scott M Hofer,et al.  Integrative data analysis through coordination of measurement and analysis protocol across independent longitudinal studies. , 2009, Psychological methods.

[51]  Delyse M Hutchinson,et al.  Cannabis and depression: an integrative data analysis of four Australasian cohorts. , 2012, Drug and alcohol dependence.

[52]  C. Carlson,et al.  The Next PAGE in Understanding Complex Traits: Design for the Analysis of Population Architecture Using Genetics and Epidemiology (PAGE) Study , 2011, American journal of epidemiology.

[53]  M. Woodward,et al.  Cohort profile: the Asia Pacific Cohort Studies Collaboration. , 2006, International journal of epidemiology.

[54]  Nancy L. Pedersen,et al.  Cross-national determinants of quality of life from six longitudinal studies on aging: The CLESA Project , 2003, Aging clinical and experimental research.