Cohort Harmonization and Integrative Analysis From a Biomedical Engineering Perspective

In this review, the critical parts and milestones for data harmonization, from the biomedical engineering perspective, are outlined. The need for data sharing between heterogeneous sources paves the way for cohort harmonization; thus, fostering data integration and interdisciplinary research. Unmet needs in chronic diseases, as well as in other diseases, can be addressed based on the integration of patient health records and the sharing of information of the clinical picture and outcome. The stratification of patients, the determination of various clinical and outcome features, and the identification of novel biomarkers for the different phenotypes of the disease characterize the impact of cohort harmonization in patient-centered clinical research and in precision medicine. Subsequently, the establishment of matching techniques and ontologies for the creation of data schemas are also presented. The exploitation of web technologies and data-collection tools supports the opportunities to achieve new levels of integration and interoperability. Ethical and legal issues that arise when sharing and harmonizing individual-level data are discussed in order to evaluate the harmonization potential. Use cases that shape and test the harmonization approach are explicitly analyzed along with their significant results on their research objectives. Finally, future trends and directions are discussed and critically reviewed toward a roadmap in cohort harmonization for clinical medicine.

[1]  M. Loane,et al.  EUROlinkCAT: Common data model , 2018, European Journal of Medical Genetics.

[2]  X. Jouven,et al.  Determinants of occurrence and survival after sudden cardiac arrest-A European perspective: The ESCAPE-NET project. , 2018, Resuscitation.

[3]  Klaus P. Ebmeier,et al.  Healthy minds 0–100 years: Optimising the use of European brain imaging cohorts (“Lifebrain”) , 2018, European Psychiatry.

[4]  Timothy Caulfield,et al.  Genes, cells, and biobanks: Yes, there’s still a consent problem , 2017, PLoS biology.

[5]  Marjan Grootveld,et al.  What you need to know to prepare a Data Management Plan (DMP). Training session on writing a Data Management Plan (DMP) , 2017 .

[6]  Oliver Butters,et al.  DataSHIELD - New Directions and Dimensions , 2017, Data Sci. J..

[7]  M. Mostert,et al.  Big Data in medical research and EU data protection law: challenges to the consent or anonymise approach , 2016, European Journal of Human Genetics.

[8]  Parminder Raina,et al.  Maelstrom Research guidelines for rigorous retrospective data harmonization , 2016, International journal of epidemiology.

[9]  Bartha Maria Knoppers,et al.  An International Framework for Data Sharing: Moving Forward with the Global Alliance for Genomics and Health. , 2016, Biopreservation and biobanking.

[10]  David L. Birtwell,et al.  OBIB-a novel ontology for biobanking , 2016, J. Biomed. Semant..

[11]  Luciano Floridi,et al.  The Ethics of Big Data: Current and Foreseeable Issues in Biomedical Contexts , 2015, Science and Engineering Ethics.

[12]  Morris A. Swertz,et al.  MOLGENIS/connect: a system for semi-automatic integration of heterogeneous phenotype data with applications in biobanks , 2016, Bioinform..

[13]  Marcia McNutt,et al.  Data sharing , 2016, Science.

[14]  Marco Capocasa,et al.  Samples and data accessibility in research biobanks: an explorative survey , 2016, PeerJ.

[15]  Debashis Sahoo,et al.  CDX2 as a Prognostic Biomarker in Stage II and Stage III Colon Cancer. , 2016, The New England journal of medicine.

[16]  John D Potter,et al.  Toward Rigorous Data Harmonization in Cancer Epidemiology Research: One Approach. , 2015, American journal of epidemiology.

[17]  Morris Swertz,et al.  Cafe Variome: General‐Purpose Software for Making Genotype–Phenotype Data Discoverable in Restricted or Open Access Contexts , 2015, Human mutation.

[18]  Morris A. Swertz,et al.  SORTA: a system for ontology-based re-coding and technical annotation of biomedical phenotype data , 2015, Database J. Biol. Databases Curation.

[19]  S. Bull,et al.  Views of Ethical Best Practices in Sharing Individual-Level Data From Medical and Public Health Research , 2015, Journal of empirical research on human research ethics : JERHRE.

[20]  Morris A. Swertz,et al.  MOLGENIS catalogue , 2015, Journal of Clinical Bioinformatics.

[21]  Paul R. Burton,et al.  ESPRESSO: taking into account assessment errors on outcome and exposures in power analysis for association studies , 2015, Bioinform..

[22]  Paul N. Schofield,et al.  The role of ontologies in biological and biomedical research: a functional perspective , 2015, Briefings Bioinform..

[23]  J. Schneider,et al.  The Statistical Modeling of Aging and Risk of Transition Project: Data Collection and Harmonization Across 11 Longitudinal Cohort Studies of Aging, Cognition, and Dementia , 2015, Observational studies.

[24]  Christel Daniel-Le Bozec,et al.  Using electronic health records for clinical research: The case of the EHR4CR project , 2015, J. Biomed. Informatics.

[25]  Lorena Otero-Cerdeira,et al.  Ontology matching: A literature review , 2015, Expert Syst. Appl..

[26]  Jennifer R. Harris,et al.  DataSHIELD: An Ethically Robust Solution to Multiple-Site Individual-Level Data Analysis , 2014, Public Health Genomics.

[27]  Morris A. Swertz,et al.  BiobankConnect: software to rapidly connect data elements for pooled analysis across biobanks using ontological and lexical indexing , 2014, J. Am. Medical Informatics Assoc..

[28]  Bartha Maria Knoppers,et al.  Framework for responsible sharing of genomic and health-related data , 2014, The HUGO Journal.

[29]  Morris A. Swertz,et al.  Spá: A Web-Based Viewer for Text Mining in Evidence Based Medicine , 2014, ECML/PKDD.

[30]  P. Robinson,et al.  RD-Connect: An Integrated Platform Connecting Databases, Registries, Biobanks and Clinical Bioinformatics for Rare Disease Research , 2014, Journal of General Internal Medicine.

[31]  Eva Blomqvist,et al.  The use of Semantic Web technologies for decision support - a survey , 2014, Semantic Web.

[32]  Jean-Paul Fox,et al.  Harmonization of Neuroticism and Extraversion phenotypes across inventories and cohorts in the Genetics of Personality Consortium: an application of Item Response Theory , 2014, Behavior genetics.

[33]  Margaret L. Kern,et al.  Integrating prospective longitudinal data: modeling personality and health in the Terman Life Cycle and Hawaii Longitudinal Studies. , 2014, Developmental psychology.

[34]  Susan E Wallace,et al.  Protecting Personal Data in Epidemiological Research: DataSHIELD and UK Law , 2014, Public Health Genomics.

[35]  B. Knoppers International ethics harmonization and the global alliance for genomics and health , 2014, Genome Medicine.

[36]  B. Knoppers,et al.  A human rights approach to an international code of conduct for genomic and clinical data sharing , 2014, Human Genetics.

[37]  Markus Perola,et al.  Data harmonization and federated analysis of population-based studies: the BioSHaRE project , 2013, Emerging Themes in Epidemiology.

[38]  Fausto Giunchiglia,et al.  S-Match: An open source framework for matching lightweight ontologies , 2012, Semantic Web.

[39]  Sabina Zambon,et al.  European Project on Osteoarthritis (EPOSA): methodological challenges in harmonization of existing data from five European population-based cohorts on aging , 2011, BMC musculoskeletal disorders.

[40]  Ian J. Deary,et al.  Age and Gender Differences in Physical Capability Levels from Mid-Life Onwards: The Harmonisation and Meta-Analysis of Data from Eight UK Cohort Studies , 2011, PloS one.

[41]  Vincent Ferretti,et al.  Is rigorous retrospective harmonization possible? Application of the DataSHaPER approach across 53 large studies. , 2011, International journal of epidemiology.

[42]  C. Tenopir,et al.  Data Sharing by Scientists: Practices and Perceptions , 2011, PloS one.

[43]  Peter Kraft,et al.  Phenotype harmonization and cross‐study collaboration in GWAS consortia: the GENEVA experience , 2011, Genetic epidemiology.

[44]  Alberto Anguita,et al.  The ACGT Master Ontology and its applications - Towards an ontology-driven cancer research and management system , 2011, J. Biomed. Informatics.

[45]  Morris A. Swertz,et al.  The MOLGENIS toolkit: rapid prototyping of biosoftware at the push of a button , 2010, BMC Bioinformatics.

[46]  Mark I. McCarthy,et al.  SAIL—a software system for sample and phenotype availability across biobanks and cohorts , 2010, Bioinform..

[47]  Cosmin Stroe,et al.  Using AgreementMaker to align ontologies for OAEI 2010 , 2010, OM.

[48]  Peter A. Bath,et al.  The harmonisation of longitudinal data: a case study using data from cohort studies in The Netherlands and the United Kingdom , 2010, Ageing and Society.

[49]  Hans Hillege,et al.  Quality, quantity and harmony: the DataSHaPER approach to integrating data across bioclinical studies , 2010, International journal of epidemiology.

[50]  M. Tobin,et al.  DataSHIELD: resolving a conflict in contemporary bioscience—performing a pooled analysis of individual-level data without sharing the data , 2010, International journal of epidemiology.

[51]  Masaki Aono,et al.  An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size , 2009, J. Web Semant..

[52]  Mansur R. Kabuka,et al.  Ontology matching with semantic verification , 2009, J. Web Semant..

[53]  Yi Li,et al.  RiMOM: A Dynamic Multistrategy Ontology Alignment Framework , 2009, IEEE Transactions on Knowledge and Data Engineering.

[54]  Luis Alfonso Ureña López,et al.  Query expansion with a medical ontology to improve a multimodal information retrieval system , 2009, Comput. Biol. Medicine.

[55]  Michael Kifer,et al.  Rule Interchange Format: The Framework , 2008, RuleML.

[56]  Yuzhong Qu,et al.  Matching large ontologies: A divide-and-conquer approach , 2008, Data Knowl. Eng..

[57]  C. Lynch Big data: How do your data grow? , 2008, Nature.

[58]  Enrico Motta,et al.  DSSim - Managing Uncertainty on the Semantic Web , 2007, OM.

[59]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[60]  Patrick Lambrix,et al.  SAMBO - A system for aligning and merging biomedical ontologies , 2006, J. Web Semant..

[61]  Kristina Nilsson,et al.  SUiS–cross-language ontology-driven information retrieval in a restricted domain , 2006, NODALIDA.

[62]  Dan Brickley,et al.  SKOS Core: Simple knowledge organisation for the Web , 2005, Dublin Core Conference.

[63]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[64]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[65]  Nancy L. Pedersen,et al.  Cross-national determinants of quality of life from six longitudinal studies on aging: The CLESA Project , 2003, Aging clinical and experimental research.

[66]  Farid Neema,et al.  Data sharing , 1998 .

[67]  Manuel de Buenaga Rodríguez,et al.  Using WordNet to Complement Training Information in Text Categorization , 1997, ArXiv.

[68]  Yu-Chuan Li,et al.  Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers , 2015, MedInfo.

[69]  Mario Piattini,et al.  An ontology for the harmonization of multiple standards and models , 2012, Comput. Stand. Interfaces.

[70]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[71]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[72]  Sean Bechhofer,et al.  OWL: Web Ontology Language , 2009, Encyclopedia of Database Systems.

[73]  Muin J. Khoury,et al.  Quantifying realistic sample size requirements for human genome epidemiology , 2008 .

[74]  York Sure-Vetter,et al.  FOAM - Framework for Ontology Alignment and Mapping - Results of the Ontology Alignment Evaluation Initiative , 2005, Integrating Ontologies.

[75]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[76]  Dan Brickley,et al.  Resource Description Framework (RDF) Model and Syntax Specification , 2002 .