Developing and Evaluating Mappings of ICD-10 and ICD-10-CM Codes to Phecodes

Many studies of Electronic Health Record (EHR) data utilize custom-developed aggregations of billing codes enabling clinical and genetic research, including phenome-wide association studies (PheWAS). One such grouping is the phecode system, originally developed for PheWAS. Phecodes were built upon the International Classification of Diseases, version 9, Clinical Modification (ICD-9-CM). However, many healthcare systems across the world use ICD-10 and ICD-10-CM, and the United States switched from ICD-9-CM to ICD-10-CM in 2015. Here we present our work on developing and validating the mappings for both ICD-10 and ICD-10-CM to phecodes. We first assessed the coverage of both the ICD-10 and ICD-10-CM phecode maps in two large databases: Vanderbilt University Medical Center (VUMC) using ICD-10-CM and the United Kingdom Biobank (UKBB) using ICD-10 codes. We then evaluated the validity of the map for ICD-10-CM by comparing phecode prevalence between ICD-9-CM and ICD-10-CM derived phecodes at VUMC. Approximately 75% of all instances of ICD-10-CM codes and 80% of ICD-10 codes were successfully mapped to phecodes. To demonstrate the utility of the ICD-10-CM map, we further performed a PheWAS using ICD-9-CM and ICD-10-CM maps. This work provides an initial high-coverage map of ICD-10 and ICD-10-CM to phecodes. These codes are publicly available to aid in EHR-based investigation.

[1]  Patrice Degoulet,et al.  Phenome-Wide Association Studies on a Quantitative Trait: Application to TPMT Enzyme Activity and Thiopurine Therapy in Pharmacogenomics , 2013, PLoS Comput. Biol..

[2]  J. Denny,et al.  Extracting research-quality phenotypes from electronic health records to support precision medicine , 2015, Genome Medicine.

[3]  George Hripcsak,et al.  LPA Variants Are Associated With Residual Cardiovascular Risk in Patients Receiving Statins , 2018, Circulation.

[4]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[5]  Olivier Bodenreider,et al.  Utilizing the UMLS for Semantic Mapping between Terminologies , 2005, AMIA.

[6]  D. Roden,et al.  Development of a Large‐Scale De‐Identified DNA Biobank to Enable Personalized Medicine , 2008, Clinical pharmacology and therapeutics.

[7]  Melissa A. Basford,et al.  Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data , 2013, Nature Biotechnology.

[8]  D. Roden,et al.  Phenome-Wide Association Studies as a Tool to Advance Precision Medicine. , 2016, Annual review of genomics and human genetics.

[9]  Katsushi Tokunaga,et al.  Phenome-wide association study maps new diseases to the human major histocompatibility complex region , 2016, Journal of Medical Genetics.

[10]  Joshua C Denny,et al.  MR-PheWAS: exploring the causal effect of SUA level on multiple disease outcomes by using genetic instruments in UK Biobank , 2018, Annals of the rheumatic diseases.

[11]  K. Bowles,et al.  ICD-9 to ICD-10: evolution, revolution, and current debates in the United States. , 2013, Perspectives in health information management.

[12]  Gerard Tromp,et al.  The phenotypic legacy of admixture between modern humans and Neandertals , 2016, Science.

[13]  Marylyn D. Ritchie,et al.  PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations , 2010, Bioinform..

[14]  Joshua C. Denny,et al.  TYK2 Protein-Coding Variants Protect against Rheumatoid Arthritis and Autoimmunity, with No Evidence of Major Pleiotropic Effects on Non-Autoimmune Complex Traits , 2015, PloS one.

[15]  Patrizio Lancellotti,et al.  Tricuspid valve regurgitation in patients with heart failure: does it matter? , 2013, European heart journal.

[16]  Dario A. Giuse,et al.  Supporting Communication in an Integrated Patient Record System , 2003, AMIA.

[17]  Silvia López-Fernández,et al.  Tricuspid valve regurgitation in patients with heart failure: does it matter? , 2013 .

[18]  Olivier Bodenreider,et al.  Preparing for the ICD-10-CM Transition: Automated Methods for Translating ICD Codes in Clinical Phenotype Definitions , 2016, EGEMS.

[19]  N. Cox,et al.  Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record , 2017, PloS one.

[20]  Peter Donnelly,et al.  Bayesian analysis of genetic association across tree-structured routine healthcare data in the UK Biobank , 2017, Nature Genetics.

[21]  Lars G Fritsche,et al.  Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies , 2017, Nature Genetics.

[22]  N. Timpson,et al.  MR-PheWAS: hypothesis prioritization among potential causal effects of body mass index on many outcomes, using Mendelian randomization , 2015, Scientific Reports.

[23]  Joshua C. Denny,et al.  R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment , 2014, Bioinform..

[24]  Li Li,et al.  Phenome-wide association study using research participants’ self-reported data provides insight into the Th17 and IL-17 pathway , 2017, PloS one.

[25]  Finale Doshi-Velez,et al.  Comorbidity Clusters in Autism Spectrum Disorders: An Electronic Health Record Time-Series Analysis , 2014, Pediatrics.

[26]  George Hripcsak,et al.  Effect of vocabulary mapping for conditions on phenotype cohorts , 2018, J. Am. Medical Informatics Assoc..

[27]  Tanya M. Teslovich,et al.  Biobank-driven genomic discovery yields new insight into atrial fibrillation biology , 2018, Nature Genetics.

[28]  Simon M Lin,et al.  Opportunities for drug repositioning from phenome-wide association studies , 2015, Nature Biotechnology.

[29]  J. Borén,et al.  Lipoprotein(a) as a cardiovascular risk factor: current status , 2010, European heart journal.

[30]  Ayellet V. Segrè,et al.  Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation , 2018, Nature Genetics.

[31]  Haley R Pipkins,et al.  Polyamine transporter potABCD is required for virulence of encapsulated but not nonencapsulated Streptococcus pneumoniae , 2017, PloS one.

[32]  Steven J. Steindel,et al.  International classification of diseases, 10th edition, clinical modification and procedure coding system: descriptive overview of the next generation HIPAA code sets , 2010, J. Am. Medical Informatics Assoc..

[33]  Paul A. Harris,et al.  PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability , 2016, J. Am. Medical Informatics Assoc..