Improving the phenotype risk score as a scalable approach to identifying patients with Mendelian disease

OBJECTIVE The Phenotype Risk Score (PheRS) is a method to detect Mendelian disease patterns using phenotypes from the electronic health record (EHR). We compared the performance of different approaches mapping EHR phenotypes to Mendelian disease features. MATERIALS AND METHODS PheRS utilizes Mendelian diseases descriptions annotated with Human Phenotype Ontology (HPO) terms. In previous work, we presented a map linking phecodes (based on International Classification of Diseases [ICD]-Ninth Revision) to HPO terms. For this study, we integrated ICD-Tenth Revision codes and lab data. We also created a new map between HPO terms using customized groupings of ICD codes. We compared the performance with cases and controls for 16 Mendelian diseases using 2.5 million de-identified medical records. RESULTS PheRS effectively distinguished cases from controls for all 15 positive controls and all approaches tested (P < 4 × 1016). Adding lab data led to a statistically significant improvement for 4 of 14 diseases. The custom ICD groupings improved specificity, leading to an average 8% increase for precision at 100 (-2% to 22%). Eight of 10 adults with cystic fibrosis tested had PheRS in the 95th percentile prio to diagnosis. DISCUSSION Both phecodes and custom ICD groupings were able to detect differences between affected cases and controls at the population level. The ICD map showed better precision for the highest scoring individuals. Adding lab data improved performance at detecting population-level differences. CONCLUSIONS PheRS is a scalable method to study Mendelian disease at the population level using electronic health record data and can potentially be used to find patients with undiagnosed Mendelian disease.

[1]  Michael J Ackerman,et al.  Association of Arrhythmia-Related Genetic Variants With Phenotypes Documented in Electronic Medical Records. , 2016, JAMA.

[2]  Chunhua Weng,et al.  Diagnostic Utility of Exome Sequencing for Kidney Disease , 2019, The New England journal of medicine.

[3]  Alan F. Scott,et al.  McKusick's Online Mendelian Inheritance in Man (OMIM®) , 2008, Nucleic Acids Res..

[4]  R. Collins,et al.  China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up. , 2011, International journal of epidemiology.

[5]  Joshua C. Denny,et al.  Phenotype risk scores identify patients with unrecognized Mendelian disease patterns , 2018, Science.

[6]  G. Bejerano,et al.  Systematic reanalysis of clinical exome data yields additional diagnoses: implications for providers , 2016, Genetics in Medicine.

[7]  Euan A Ashley,et al.  Effect of Genetic Diagnosis on Patients with Previously Undiagnosed Disease , 2018, The New England journal of medicine.

[8]  Gill Bejerano,et al.  ClinPhen extracts and prioritizes patient phenotypes directly from medical records to expedite genetic disease diagnosis , 2018, Genetics in Medicine.

[9]  P. Sankar,et al.  The Precision Medicine Initiative’s All of Us Research Program: an agenda for research on its ethical, legal, and social issues , 2016, Genetics in Medicine.

[10]  George Hripcsak,et al.  Next-generation phenotyping of electronic health records , 2012, J. Am. Medical Informatics Assoc..

[11]  J. Haines,et al.  eMERGEing progress in genomics—the first seven years , 2014, Front. Genet..

[12]  Paul A. Harris,et al.  Secondary use of clinical data: The Vanderbilt approach , 2014, J. Biomed. Informatics.

[13]  Daniel J. Vreeman,et al.  Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery , 2019, bioRxiv.

[14]  Adam Wright,et al.  Using whole genome scores to compare three clinical phenotyping methods in complex diseases , 2018, Scientific Reports.

[15]  N. Cox,et al.  Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record , 2017, PloS one.

[16]  V A McKusick,et al.  On Lumpers and Splitters, or the Nosology of Genetic Disease , 2015, Perspectives in biology and medicine.

[17]  D. Roden,et al.  Development of a Large‐Scale De‐Identified DNA Biobank to Enable Personalized Medicine , 2008, Clinical pharmacology and therapeutics.

[18]  Joshua C Denny,et al.  Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals , 2017, J. Am. Medical Informatics Assoc..

[19]  J C Denny,et al.  Representing Knowledge Consistently Across Health Systems , 2017, Yearbook of Medical Informatics.

[20]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[21]  R S LEDLEY,et al.  Reasoning foundations of medical diagnosis; symbolic logic, probability, and value theory aid our understanding of how physicians reason. , 1959, Science.

[22]  Melissa A. Basford,et al.  Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data , 2013, Nature Biotechnology.