Mapping ICD-10 and ICD-10-CM Codes to Phecodes: Workflow Development and Initial Evaluation

Background The phecode system was built upon the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) for phenome-wide association studies (PheWAS) using the electronic health record (EHR). Objective The goal of this paper was to develop and perform an initial evaluation of maps from the International Classification of Diseases, 10th Revision (ICD-10) and the International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) codes to phecodes. Methods We mapped ICD-10 and ICD-10-CM codes to phecodes using a number of methods and resources, such as concept relationships and explicit mappings from the Centers for Medicare & Medicaid Services, the Unified Medical Language System, Observational Health Data Sciences and Informatics, Systematized Nomenclature of Medicine-Clinical Terms, and the National Library of Medicine. We assessed the coverage of the maps in two databases: Vanderbilt University Medical Center (VUMC) using ICD-10-CM and the UK Biobank (UKBB) using ICD-10. We assessed the fidelity of the ICD-10-CM map in comparison to the gold-standard ICD-9-CM phecode map by investigating phenotype reproducibility and conducting a PheWAS. Results We mapped >75% of ICD-10 and ICD-10-CM codes to phecodes. Of the unique codes observed in the UKBB (ICD-10) and VUMC (ICD-10-CM) cohorts, >90% were mapped to phecodes. We observed 70-75% reproducibility for chronic diseases and <10% for an acute disease for phenotypes sourced from the ICD-10-CM phecode map. Using the ICD-9-CM and ICD-10-CM maps, we conducted a PheWAS with a Lipoprotein(a) genetic variant, rs10455872, which replicated two known genotype-phenotype associations with similar effect sizes: coronary atherosclerosis (ICD-9-CM: P<.001; odds ratio (OR) 1.60 [95% CI 1.43-1.80] vs ICD-10-CM: P<.001; OR 1.60 [95% CI 1.43-1.80]) and chronic ischemic heart disease (ICD-9-CM: P<.001; OR 1.56 [95% CI 1.35-1.79] vs ICD-10-CM: P<.001; OR 1.47 [95% CI 1.22-1.77]). Conclusions This study introduces the beta versions of ICD-10 and ICD-10-CM to phecode maps that enable researchers to leverage accumulated ICD-10 and ICD-10-CM data for PheWAS in the EHR.

[1]  Lars G Fritsche,et al.  Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies , 2017, Nature Genetics.

[2]  Bo Jin,et al.  Online Prediction of Health Care Utilization in the Next Six Months Based on Electronic Health Record Information: A Cohort and Validation Study , 2015, Journal of medical Internet research.

[3]  D. Roden,et al.  Development of a Large‐Scale De‐Identified DNA Biobank to Enable Personalized Medicine , 2008, Clinical pharmacology and therapeutics.

[4]  Finale Doshi-Velez,et al.  Comorbidity Clusters in Autism Spectrum Disorders: An Electronic Health Record Time-Series Analysis , 2014, Pediatrics.

[5]  J. Denny,et al.  Using Topic Modeling via Non-negative Matrix Factorization to Identify Relationships between Genetic Variants and Disease Phenotypes: A Case Study of Lipoprotein(a) (LPA) , 2018, bioRxiv.

[6]  Gerard Tromp,et al.  The phenotypic legacy of admixture between modern humans and Neandertals , 2016, Science.

[7]  Tanya M. Teslovich,et al.  Biobank-driven genomic discovery yields new insight into atrial fibrillation biology , 2018, Nature Genetics.

[8]  Yu-Chuan Li,et al.  Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers , 2015, MedInfo.

[9]  Joshua C Denny,et al.  Using topic modeling via non-negative matrix factorization to identify relationships between genetic variants and disease phenotypes: A case study of Lipoprotein(a) (LPA) , 2018, bioRxiv.

[10]  N. Timpson,et al.  MR-PheWAS: hypothesis prioritization among potential causal effects of body mass index on many outcomes, using Mendelian randomization , 2015, Scientific Reports.

[11]  Patrice Degoulet,et al.  Phenome-Wide Association Studies on a Quantitative Trait: Application to TPMT Enzyme Activity and Thiopurine Therapy in Pharmacogenomics , 2013, PLoS Comput. Biol..

[12]  Olivier Bodenreider,et al.  Utilizing the UMLS for Semantic Mapping between Terminologies , 2005, AMIA.

[13]  Joshua C Denny,et al.  MR-PheWAS: exploring the causal effect of SUA level on multiple disease outcomes by using genetic instruments in UK Biobank , 2018, Annals of the rheumatic diseases.

[14]  Joshua C Denny,et al.  Learning from Longitudinal Data in Electronic Health Record and Genetic Data to Improve Cardiovascular Event Prediction , 2018, Scientific Reports.

[15]  Susan M. Resnick,et al.  Electronic Medical Record Context Signatures Improve Diagnostic Classification Using Medical Image Computing , 2019, IEEE Journal of Biomedical and Health Informatics.

[16]  Joshua C. Denny,et al.  R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment , 2014, Bioinform..

[17]  Ayellet V. Segrè,et al.  Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation , 2018, Nature Genetics.

[18]  Maryam Zolnoori,et al.  Public Opinions Toward Diseases: Infodemiological Study on News Media Data , 2018, Journal of medical Internet research.

[19]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[20]  Li Li,et al.  Phenome-wide association study using research participants’ self-reported data provides insight into the Th17 and IL-17 pathway , 2017, PloS one.

[21]  Paul A. Harris,et al.  PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability , 2016, J. Am. Medical Informatics Assoc..

[22]  Joshua C. Denny,et al.  Phenotype risk scores identify patients with unrecognized Mendelian disease patterns , 2018, Science.

[23]  George Hripcsak,et al.  LPA Variants Are Associated With Residual Cardiovascular Risk in Patients Receiving Statins , 2018, Circulation.

[24]  N. Cox,et al.  Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record , 2017, PloS one.

[25]  Olivier Bodenreider,et al.  Preparing for the ICD-10-CM Transition: Automated Methods for Translating ICD Codes in Clinical Phenotype Definitions , 2016, EGEMS.

[26]  Dario A. Giuse,et al.  Supporting Communication in an Integrated Patient Record System , 2003, AMIA.

[27]  Katsushi Tokunaga,et al.  Phenome-wide association study maps new diseases to the human major histocompatibility complex region , 2016, Journal of Medical Genetics.

[28]  A. Basu,et al.  In-Hospital Outcomes and Costs Among Patients Hospitalized During a Return Visit to the Emergency Department. , 2016, JAMA.

[29]  Melissa A. Basford,et al.  Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data , 2013, Nature Biotechnology.

[30]  Tianxi Cai,et al.  Spherical Regression Under Mismatch Corruption With Application to Automated Knowledge Translation , 2018, Journal of the American Statistical Association.

[31]  Steven J. Steindel,et al.  International classification of diseases, 10th edition, clinical modification and procedure coding system: descriptive overview of the next generation HIPAA code sets , 2010, J. Am. Medical Informatics Assoc..

[32]  J. Borén,et al.  Lipoprotein(a) as a cardiovascular risk factor: current status , 2010, European heart journal.

[33]  K. Bowles,et al.  ICD-9 to ICD-10: evolution, revolution, and current debates in the United States. , 2013, Perspectives in health information management.

[34]  George Hripcsak,et al.  Effect of vocabulary mapping for conditions on phenotype cohorts , 2018, J. Am. Medical Informatics Assoc..

[35]  Simon M Lin,et al.  Opportunities for drug repositioning from phenome-wide association studies , 2015, Nature Biotechnology.

[36]  Joshua C. Denny,et al.  TYK2 Protein-Coding Variants Protect against Rheumatoid Arthritis and Autoimmunity, with No Evidence of Major Pleiotropic Effects on Non-Autoimmune Complex Traits , 2015, PloS one.