Using whole genome scores to compare three clinical phenotyping methods in complex diseases

Genome-wide association studies depend on accurate ascertainment of patient phenotype. However, phenotyping is difficult, and it is often treated as an afterthought in these studies because of the expense involved. Electronic health records (EHRs) may provide higher fidelity phenotypes for genomic research than other sources such as administrative data. We used whole genome association models to evaluate different EHR and administrative data-based phenotyping methods in a cohort of 16,858 Caucasian subjects for type 1 diabetes mellitus, type 2 diabetes mellitus, coronary artery disease and breast cancer. For each disease, we trained and evaluated polygenic models using three different phenotype definitions: phenotypes derived from billing data, the clinical problem list, or a curated phenotyping algorithm. We observed that for these diseases, the curated phenotype outperformed the problem list, and the problem list outperformed administrative billing data. This suggests that using advanced EHR-derived phenotypes can further increase the power of genome-wide association studies.

[1]  George Hripcsak,et al.  High-fidelity phenotyping: richness and freedom from bias , 2017, J. Am. Medical Informatics Assoc..

[2]  John P A Ioannidis,et al.  Meta-analysis in genome-wide association studies. , 2009, Pharmacogenomics.

[3]  Adam Wright,et al.  An automated technique for identifying associations between medications, laboratory results and problems , 2010, J. Biomed. Informatics.

[4]  J. Denny,et al.  Extracting research-quality phenotypes from electronic health records to support precision medicine , 2015, Genome Medicine.

[5]  Kyle J. Gaulton,et al.  Genome-wide associations for birth weight and correlations with adult disease , 2016 .

[6]  Stephen C. J. Parker,et al.  The genetic architecture of type 2 diabetes , 2016, Nature.

[7]  Timothy R. Smith,et al.  Validation of an International Classification of Disease, Ninth Revision coding algorithm to identify decompressive craniectomy for stroke , 2017, BMC Neurology.

[8]  Christian Gieger,et al.  Genetic Variants in Novel Pathways Influence Blood Pressure and Cardiovascular Disease Risk , 2011, Nature.

[9]  Michael R. Johnson,et al.  Re-evaluation of SNP heritability in complex human traits , 2016, Nature Genetics.

[10]  M. Daly,et al.  An Atlas of Genetic Correlations across Human Diseases and Traits , 2015, Nature Genetics.

[11]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[12]  Karen Tu,et al.  Validation of physician billing and hospitalization data to identify patients with ischemic heart disease using data from the Electronic Medical Record Administrative data Linked Database (EMRALD). , 2010, The Canadian journal of cardiology.

[13]  Melissa A. Basford,et al.  The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future , 2013, Genetics in Medicine.

[14]  P. Visscher,et al.  Common polygenic variation contributes to risk of schizophrenia and bipolar disorder , 2009, Nature.

[15]  Timothy E. Reddy,et al.  Genomic approaches for understanding the genetics of complex disease , 2015, Genome research.

[16]  David Aron,et al.  Failure of ICD-9-CM codes to identify patients with comorbid chronic kidney disease in diabetes. , 2006, Health services research.

[17]  David W. Bates,et al.  A method and knowledge base for automated inference of patient problems from structured data in an electronic medical record , 2011, J. Am. Medical Informatics Assoc..

[18]  He Zhang,et al.  Systematic Evaluation of Pleiotropy Identifies 6 Further Loci Associated With Coronary Artery Disease , 2017, Journal of the American College of Cardiology.

[19]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[20]  Tesfaye B Mersha,et al.  Self-reported race/ethnicity in the age of genomic research: its potential impact on understanding health disparities , 2015, Human Genomics.

[21]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[22]  Hailiang Huang,et al.  Fine-mapping inflammatory bowel disease loci to single variant resolution , 2017, Nature.

[23]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[24]  P. Visscher,et al.  The Genetic Interpretation of Area under the ROC Curve in Genomic Profiling , 2010, PLoS genetics.

[25]  Andres Metspalu,et al.  Personalized risk prediction for type 2 diabetes: the potential of genetic risk scores , 2016, Genetics in Medicine.

[26]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[27]  R. Heller,et al.  Accuracy of administrative data to assess comorbidity in patients with heart disease. an Australian perspective. , 2001, Journal of clinical epidemiology.

[28]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[29]  Melissa A. Basford,et al.  Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[30]  Melissa A. Basford,et al.  Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. , 2010, American journal of human genetics.

[31]  I. Kohane Using electronic health records to drive discovery in disease genomics , 2011, Nature Reviews Genetics.

[32]  Tanya M. Teslovich,et al.  Evaluating the contribution of rare variants to type 2 diabetes and related traits using pedigrees , 2017, Proceedings of the National Academy of Sciences.

[33]  Rongling Li,et al.  Quality Control Procedures for Genome‐Wide Association Studies , 2011, Current protocols in human genetics.

[34]  Joshua C. Denny,et al.  Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance , 2016, J. Am. Medical Informatics Assoc..

[35]  Peter Szolovits,et al.  Enabling phenotypic big data with PheNorm , 2018, J. Am. Medical Informatics Assoc..

[36]  Jianxin Shi,et al.  Developing and evaluating polygenic risk prediction models for stratified disease prevention , 2016, Nature Reviews Genetics.

[37]  Sushrut S Waikar,et al.  Performance and limitations of administrative data in the identification of AKI. , 2014, Clinical journal of the American Society of Nephrology : CJASN.

[38]  Gary D Bader,et al.  Association analysis identifies 65 new breast cancer risk loci , 2017, Nature.

[39]  T. Cai,et al.  Genetic validation of bipolar disorder identified by automated phenotyping using electronic health records , 2017, bioRxiv.

[40]  Tom R. Gaunt,et al.  Genetic Variants in Novel Pathways Influence Blood Pressure and Cardiovascular Disease Risk , 2011, Nature.

[41]  M. Kivimäki,et al.  Self-report as an indicator of incident disease. , 2010, Annals of epidemiology.

[42]  Evangelos Evangelou,et al.  Heterogeneity in Meta-Analyses of Genome-Wide Association Investigations , 2007, PloS one.

[43]  Adam Wright,et al.  Clinician attitudes toward and use of electronic problem lists: a thematic analysis , 2011, BMC Medical Informatics Decis. Mak..

[44]  I. Kohane,et al.  Instrumenting the health care enterprise for discovery research in the genomic era. , 2009, Genome research.

[45]  Nich Wattanasin,et al.  The Biobank Portal for Partners Personalized Medicine: A Query Tool for Working with Consented Biobank Samples, Genotypes, and Phenotypes Using i2b2 , 2016, Journal of personalized medicine.

[46]  Víctor Potenciano,et al.  A comparison of genomic profiles of complex diseases under different models , 2015, BMC Medical Genomics.

[47]  J V Tu,et al.  Myocardial infarction and the validation of physician billing and hospitalization data using electronic medical records. , 2010, Chronic diseases in Canada.

[48]  A. Korte,et al.  The advantages and limitations of trait analysis with GWAS: a review , 2013, Plant Methods.

[49]  J. Ryan,et al.  A Review of the Role of Electronic Health Record in Genomic Research , 2014, Journal of Cardiovascular Translational Research.

[50]  Peter Kraft,et al.  Evaluation of polygenic risk scores for predicting breast and prostate cancer risk , 2011, Genetic epidemiology.

[51]  I. Kohane,et al.  Development of phenotype algorithms using electronic medical records and incorporating natural language processing , 2015, BMJ : British Medical Journal.

[52]  Tanya M. Teslovich,et al.  Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility , 2014, Nature Genetics.