Ontologizing health systems data at scale: making translational discovery a reality

Common data models solve many challenges of standardizing electronic health record (EHR) data but are unable to semantically integrate all of the resources needed for deep phenotyping. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneous data. However, mapping EHR data to OBO ontologies requires significant manual curation and domain expertise. We introduce OMOP2OBO, an algorithm for mapping Observational Medical Outcomes Partnership (OMOP) vocabularies to OBO ontologies. Using OMOP2OBO, we produced mappings for 92,367 conditions, 8611 drug ingredients, and 10,673 measurement results, which covered 68–99% of concepts used in clinical practice when examined across 24 hospitals. When used to phenotype rare disease patients, the mappings helped systematically identify undiagnosed patients who might benefit from genetic testing. By aligning OMOP vocabularies to OBO ontologies our algorithm presents new opportunities to advance EHR-based deep phenotyping.

[1]  Benjamin M. Gyori,et al.  Unifying the identification of biomedical entities with the Bioregistry , 2022, bioRxiv.

[2]  K. Gersing,et al.  Manifestations Associated with Post Acute Sequelae of SARS-CoV2 Infection (PASC) Predict Diagnosis of New-Onset Psychiatric Disease: Findings from the NIH N3C and RECOVER Studies , 2022, medRxiv.

[3]  Julius O. B. Jacobsen,et al.  The GA4GH Phenopacket schema defines a computable representation of clinical data , 2022, Nature Biotechnology.

[4]  Michael A. Gargano,et al.  Generalizable Long COVID Subtypes: Findings from the NIH N3C and RECOVER Programs , 2022, medRxiv.

[5]  Charles Tapley Hoyt,et al.  Mondo: Unifying diseases for the world, by the world , 2022, medRxiv.

[6]  James A. Overton,et al.  Ontology Development Kit: a toolkit for building, maintaining and standardizing biomedical ontologies , 2022, Database J. Biol. Databases Curation.

[7]  Christopher G Chute,et al.  A Simple Standard for Sharing Ontological Mappings (SSSOM) , 2021, Database J. Biol. Databases Curation.

[8]  G. Hripcsak,et al.  Phenotyping in distributed data networks: selecting the right codes for the right patients , 2022, AMIA.

[9]  G. Stein,et al.  Characterizing Long COVID: Deep Phenotype of a Complex Condition , 2021, EBioMedicine.

[10]  A. Stenzinger,et al.  Brigatinib versus other second-generation ALK inhibitors as initial treatment of anaplastic lymphoma kinase positive non-small cell lung cancer with deep phenotyping: study protocol of the ABP trial , 2021, BMC cancer.

[11]  Andrea H. Ramirez,et al.  Progress With the All of Us Research Program: Opening Access for Researchers. , 2021, JAMA.

[12]  W. Chung,et al.  ACMG SF v3.0 list for reporting of secondary findings in clinical exome and genome sequencing: a policy statement of the American College of Medical Genetics and Genomics (ACMG) , 2021, Genetics in Medicine.

[13]  R. Stewart,et al.  Artificial intelligence–assisted phenotype discovery of fragile X syndrome in a population-based sample , 2021, Genetics in Medicine.

[14]  U. Topaloglu,et al.  Challenges in defining Long COVID: Striking differences across literature, Electronic Health Records, and patient-reported information , 2021, medRxiv.

[15]  Margherita Francescatto,et al.  Natural human knockouts and Mendelian disorders: deep phenotyping in Italian isolates , 2021, European Journal of Human Genetics.

[16]  Steven E. Sloan,et al.  Deep phenotyping in 3q29 deletion syndrome: recommendations for clinical care , 2021, Genetics in Medicine.

[17]  O. Gevaert,et al.  Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovascular diseases , 2021, medRxiv.

[18]  Mary Regina Boland,et al.  Towards deep phenotyping pregnancy: a systematic review on artificial intelligence and machine learning methods to improve pregnancy outcomes , 2021, Briefings Bioinform..

[19]  N. Cox,et al.  Phenotypic signatures in clinical data enable systematic identification of patients for genetic testing , 2020, Nature Medicine.

[20]  OUP accepted manuscript , 2021, Database.

[21]  M. Dimopoulos,et al.  Deep Phenotyping Reveals Distinct Immune Signatures Correlating with Prognostication, Treatment Responses, and MRD Status in Multiple Myeloma , 2020, Cancers.

[22]  P. Kishnani,et al.  Benign or not benign? Deep phenotyping of liver Glycogen Storage Disease IX. , 2020, Molecular genetics and metabolism.

[23]  H. Nakagawa,et al.  Deep immunophenotyping at the single-cell level identifies a combination of anti-IL-17 and checkpoint blockade as an effective treatment in a preclinical model of data-guided personalized immunotherapy , 2020, Journal for ImmunoTherapy of Cancer.

[24]  J. Chae,et al.  Deep Phenotyping in 1p36 Deletion Syndrome , 2020 .

[25]  D. Adams,et al.  The peroxisomal disorder spectrum and Heimler syndrome: Deep phenotyping and review of the literature , 2020, American journal of medical genetics. Part C, Seminars in medical genetics.

[26]  Alex S. Felmeister,et al.  A longitudinal footprint of genetic epilepsies using automated electronic medical record interpretation , 2020, Genetics in Medicine.

[27]  Max A. Little,et al.  Deep Phenotyping of Parkinson’s Disease , 2020, Journal of Parkinson's disease.

[28]  K. Ding,et al.  An ontology-based classification of Ebstein's anomaly and its implications in clinical adverse outcomes. , 2020, International journal of cardiology.

[29]  George Hripcsak,et al.  Deep phenotyping: Embracing complexity and temporality—Towards scalability, portability, and interoperability , 2020, Journal of Biomedical Informatics.

[30]  L. Hood,et al.  Deep phenotyping during pregnancy for predictive and preventive medicine , 2020, Science Translational Medicine.

[31]  R. Saxena,et al.  Robinow Syndrome and Brachydactyly: An Interplay of High-Throughput Sequencing and Deep Phenotyping in a Kindred , 2020, Molecular Syndromology.

[32]  J. H. van der Lee,et al.  Deep phenotyping classical galactosemia: clinical outcomes and biochemical markers , 2020, Brain communications.

[33]  A. Dubra,et al.  Deep Phenotyping of PDE6C-Associated Achromatopsia , 2019, Investigative ophthalmology & visual science.

[34]  Young T. Hong,et al.  Longitudinal trajectories of amyloid deposition, cortical thickness, and tau in Down syndrome: A deep-phenotyping case report , 2019, Alzheimer's & dementia.

[35]  A. Swillen,et al.  Deep Phenotyping of Development, Communication and Behaviour in Phelan-McDermid Syndrome , 2019, Molecular Syndromology.

[36]  Claudia Bauzer Medeiros,et al.  Exploring Semantics in Clinical Data Interoperability , 2019, ER Workshops.

[37]  James T. Yurkovich,et al.  A systems approach to clinical oncology uses deep phenotyping to deliver personalized care , 2019, Nature Reviews Clinical Oncology.

[38]  Lisa Bastarache,et al.  Improving the phenotype risk score as a scalable approach to identifying patients with Mendelian disease , 2019, J. Am. Medical Informatics Assoc..

[39]  P. Missier,et al.  Increasing phenotypic annotation improves the diagnostic rate of exome sequencing in a rare neuromuscular disorder , 2019, Human mutation.

[40]  K. Sirinukunwattana,et al.  Improving the diagnosis and classification of Ph-negative myeloproliferative neoplasms through deep phenotyping , 2019, bioRxiv.

[41]  R. Hoehndorf,et al.  PathoPhenoDB, linking human pathogens to their phenotypes in support of infectious disease research , 2019, Scientific Data.

[42]  Brad N. Greenwood,et al.  The Digitization of Patient Care: A Review of the Effects of Electronic Health Records on Health Care Quality and Utilization. , 2019, Annual review of public health.

[43]  Alan R. Moody,et al.  From Big Data to Precision Medicine , 2019, Front. Med..

[44]  Daniel J. Vreeman,et al.  Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery , 2019, bioRxiv.

[45]  Chunhua Weng,et al.  Diagnostic Utility of Exome Sequencing for Kidney Disease , 2019, The New England journal of medicine.

[46]  Jean-Philippe F Gourdine,et al.  Representing glycophenotypes: semantic unification of glycobiology resources for disease discovery , 2019, Database J. Biol. Databases Curation.

[47]  Tudor Groza,et al.  Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources , 2018, Nucleic Acids Res..

[48]  Christopher G Chute,et al.  Classification, Ontology, and Precision Medicine. , 2018, The New England journal of medicine.

[49]  Euan A Ashley,et al.  Effect of Genetic Diagnosis on Patients with Previously Undiagnosed Disease , 2018, The New England journal of medicine.

[50]  C. Lindgren,et al.  Deep clinical and biological phenotyping of the preterm birth and small for gestational age syndromes: The INTERBIO-21 st Newborn Case-Control Study protocol , 2018, Gates open research.

[51]  C. Lindgren,et al.  Deep clinical and biological phenotyping of the preterm birth and small for gestational age syndromes: The INTERBIO-21 st Newborn Case-Control Study protocol , 2018, Gates open research.

[52]  Geoffrey E. Hinton Deep Learning-A Technology With the Potential to Transform Health Care. , 2018, JAMA.

[53]  Alexandros Kalousis,et al.  Biomedical ontology alignment: an approach based on representation learning , 2018, Journal of Biomedical Semantics.

[54]  Peter N. Robinson,et al.  A Census of Disease Ontologies , 2018, Annual Review of Biomedical Data Science.

[55]  H. Edgren,et al.  Abstract 2276: Efficient curation and ontology mapping of clinical and phenotypic data , 2018, Bioinformatics and Systems Biology.

[56]  Riccardo L. Rossi,et al.  Big Data: Challenge and Opportunity for Translational and Industrial Research in Healthcare , 2018, Front. Digit. Humanit..

[57]  Joseph Loscalzo,et al.  Emerging Role of Precision Medicine in Cardiovascular Disease. , 2018, Circulation research.

[58]  I. Kohane,et al.  Big Data and Machine Learning in Health Care. , 2018, JAMA.

[59]  Cui Tao,et al.  Assessing the practice of biomedical ontology evaluation: Gaps and opportunities , 2018, J. Biomed. Informatics.

[60]  Joshua C. Denny,et al.  Phenotype risk scores identify patients with unrecognized Mendelian disease patterns , 2018, Science.

[61]  Anna Okula Basile,et al.  Informatics and machine learning to define the phenotype , 2018, Expert review of molecular diagnostics.

[62]  Evelina Fedorenko,et al.  Deep phenotyping of speech and language skills in individuals with 16p11.2 deletion , 2018, European Journal of Human Genetics.

[63]  Adrianne L. Stefanski,et al.  Leveraging a Neural-Symbolic Representation of Biomedical Knowledge to Improve Pediatric Subphenotyping , 2018 .

[64]  Olivier Bodenreider,et al.  Interoperability of Disease Concepts in Clinical and Research Ontologies: Contrasting Coverage and Structure in the Disease Ontology and SNOMED CT , 2018, MedInfo.

[65]  Julia Adler-Milstein,et al.  HITECH Act Drove Large Gains In Hospital Electronic Health Record Adoption. , 2017, Health affairs.

[66]  A. Butte Big data opens a window onto wellness , 2017, Nature Biotechnology.

[67]  N. Cox,et al.  Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record , 2017, PloS one.

[68]  Ebony B. Madden,et al.  Challenges and strategies for implementing genomic services in diverse settings: experiences from the Implementing GeNomics In pracTicE (IGNITE) network , 2017, BMC Medical Genomics.

[69]  Thomas Meitinger,et al.  Genetic diagnosis of Mendelian disorders via RNA sequencing , 2017, Nature Communications.

[70]  J. Moon,et al.  Designer vaccine nanodiscs for personalized cancer immunotherapy , 2016, Nature materials.

[71]  E. Topol,et al.  Adapting to Artificial Intelligence: Radiologists and Pathologists as Information Specialists. , 2016, JAMA.

[72]  Tudor Groza,et al.  The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species , 2016, bioRxiv.

[73]  D. Roden,et al.  Phenome-Wide Association Studies as a Tool to Advance Precision Medicine. , 2016, Annual review of genomics and human genetics.

[74]  R Cornet,et al.  Health Concept and Knowledge Management: Twenty-five Years of Evolution , 2016, Yearbook of Medical Informatics.

[75]  Catalina Martínez-Costa,et al.  A semantic web based framework for the interoperability and exploitation of clinical models and EHR data , 2016, Knowl. Based Syst..

[76]  Jimeng Sun,et al.  Clinical phenotyping in selected national networks: demonstrating the need for high-throughput, portable, and computational methods , 2016, Artif. Intell. Medicine.

[77]  R S Evans,et al.  Electronic Health Records: Then, Now, and in the Future , 2016, Yearbook of Medical Informatics.

[78]  A. Lehmann,et al.  Deep phenotyping of 89 xeroderma pigmentosum patients reveals unexpected heterogeneity dependent on the precise molecular defect , 2016, Proceedings of the National Academy of Sciences.

[79]  Olivier Bodenreider,et al.  Interoperability between phenotypes in research and healthcare terminologies—Investigating partial mappings between HPO and SNOMED CT , 2016, J. Biomed. Semant..

[80]  Christoph Steinbeck,et al.  ChEBI in 2016: Improved services and an expanding collection of metabolites , 2015, Nucleic Acids Res..

[81]  Hong Sun,et al.  Semantic processing of EHR data for clinical research , 2015, J. Biomed. Informatics.

[82]  Cathryn M. Delude Deep phenotyping: The details of disease , 2015, Nature.

[83]  Gil Alterovitz,et al.  SMART on FHIR Genomics: facilitating standardized clinico-genomic apps , 2015, J. Am. Medical Informatics Assoc..

[84]  Paul N. Schofield,et al.  The role of ontologies in biological and biomedical research: a functional perspective , 2015, Briefings Bioinform..

[85]  Keith Marsolo,et al.  PEDSnet: a National Pediatric Learning Health System , 2014, J. Am. Medical Informatics Assoc..

[86]  M. Furukawa,et al.  Clinical benefits of electronic health record use: national findings. , 2014, Health services research.

[87]  Magalie S Leduc,et al.  Clinical whole-exome sequencing for the diagnosis of mendelian disorders. , 2013, The New England journal of medicine.

[88]  Abel N. Kho,et al.  Practical challenges in integrating genomic data into the electronic health record , 2013, Genetics in Medicine.

[89]  Halil Kilicoglu,et al.  SemMedDB: a PubMed-scale repository of biomedical semantic predications , 2012, Bioinform..

[90]  Daniel J. Vreeman,et al.  Auditing consistency and usefulness of LOINC use among three large institutions - Using version spaces for grouping LOINC codes , 2012, J. Biomed. Informatics.

[91]  Peter N. Robinson,et al.  Deep phenotyping for precision medicine , 2012, Human mutation.

[92]  Patrick B. Ryan,et al.  Validation of a common data model for active safety surveillance research , 2012, J. Am. Medical Informatics Assoc..

[93]  S. Lewis,et al.  Uberon, an integrative multi-species anatomy ontology , 2012, Genome Biology.

[94]  Stuart J. Nelson,et al.  Normalized names for clinical drugs: RxNorm at 6 years , 2011, J. Am. Medical Informatics Assoc..

[95]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[96]  Jian Zhang,et al.  The Protein Ontology: a structured representation of protein forms and complexes , 2010, Nucleic Acids Res..

[97]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[98]  Anna Zhukova,et al.  Modeling sample variables with an Experimental Factor Ontology , 2010, Bioinform..

[99]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[100]  Fang Chen,et al.  VIOLIN: vaccine investigation and online information network , 2007, Nucleic Acids Res..

[101]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[102]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information , 2007, Nucleic Acids Res..

[103]  M. Ashburner,et al.  An ontology for cell types , 2005, Genome Biology.

[104]  Asunción Gómez-Pérez,et al.  Ontology Evaluation , 2004, Handbook on Ontologies.

[105]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[106]  C. Sabatti,et al.  The Human Phenome Project , 2003, Nature Genetics.

[107]  C. McDonald,et al.  LOINC, a universal standard for identifying laboratory observations: a 5-year update. , 2003, Clinical chemistry.

[108]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[109]  C. McDonald,et al.  A computerized reminder system to increase the use of preventive care for hospitalized patients. , 2001, The New England journal of medicine.

[110]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[111]  William Edward Hammond,et al.  Call for a Standard Clinical Vocabulary , 1997, J. Am. Medical Informatics Assoc..

[112]  Anthony Aguirre,et al.  High-Performance Medical Libraries: Advances in Information Management for the Virtual Era , 1994 .

[113]  Alexa T. McCray,et al.  Representing biomedical knowledge in the UMLS semantic network , 1993 .

[114]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[115]  G. H. Knibbs,et al.  The International Classification of Disease and Causes of Death and its Revision. , 1929 .