Evaluation of phenotyping errors on polygenic risk score predictions

Accurate disease risk prediction is essential in healthcare to provide personalized disease prevention and treatment strategies not only to the patients, but also to the general population. In addition to demographic and environmental factors, advancements in genomic research have revealed that genetics play an important role in determining the susceptibility of diseases. However, for most complex diseases, individual genetic variants are only weakly to moderately associated with the diseases. Thus, they are not clinically informative in determining disease risks. Nevertheless, recent findings suggest that the combined effects from multiple disease-associated variants, or polygenic risk score (PRS), can stratify disease risk similar to that of rare monogenic mutations. The development of polygenic risk score provides a promising tool to evaluate the genetic contribution of disease risk; however, the quality of the risk prediction depends on many contributing factors including the precision of the target phenotypes. In this study, we evaluated the impact of phenotyping errors on the accuracies of PRS risk prediction. We utilized electronic Medical Records and Genomics Network (eMERGE) data to simulate various types of disease phenotypes. For each phenotype, we quantified the impact of phenotyping errors generated from the differential and non-differential mechanism by comparing the prediction accuracies of PRS on the independent testing data. In addition, our results showed that the rate of accuracy degradation depended on both the phenotype and the mechanism of phenotyping error.

[1]  M. Pencina,et al.  Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond , 2008, Statistics in medicine.

[2]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[3]  P. Visscher,et al.  Common polygenic variation contributes to risk of schizophrenia and bipolar disorder , 2009, Nature.

[4]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[5]  Wendy A. Wolf,et al.  The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies , 2011, BMC Medical Genomics.

[6]  M. McCarthy,et al.  Improved detection of common variants associated with schizophrenia by leveraging pleiotropy with cardiovascular-disease risk factors. , 2013, American journal of human genetics.

[7]  C. Spencer,et al.  Biological Insights From 108 Schizophrenia-Associated Genetic Loci , 2014, Nature.

[8]  Marylyn D. Ritchie,et al.  Imputation and quality control steps for combining multiple genome-wide datasets , 2014, Front. Genet..

[9]  J. Denny,et al.  Extracting research-quality phenotypes from electronic health records to support precision medicine , 2015, Genome Medicine.

[10]  H. Chernoff,et al.  Why significant variables aren’t automatically good predictors , 2015, Proceedings of the National Academy of Sciences.

[11]  Jack Euesden,et al.  PRSice: Polygenic Risk Score software , 2014, Bioinform..

[12]  C. Spencer,et al.  A contribution of novel CNVs to schizophrenia from a genome-wide study of 41,321 subjects: CNV Analysis Group and the Schizophrenia Working Group of the Psychiatric Genomics Consortium , 2016, bioRxiv.

[13]  Paul A. Harris,et al.  PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability , 2016, J. Am. Medical Informatics Assoc..

[14]  Jing Huang,et al.  An Empirical Study for Impacts of Measurement Errors on EHR based Association Studies , 2016, AMIA.

[15]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[16]  Christopher R. Gignoux,et al.  Human demographic history impacts genetic risk prediction across diverse populations , 2016, bioRxiv.

[17]  Alicia R. Martin,et al.  Hidden ‘risk’ in polygenic scores: clinical use today could exacerbate health disparities , 2018, bioRxiv.

[18]  Alicia R. Martin,et al.  Current clinical use of polygenic scores will risk exacerbating health disparities , 2018 .

[19]  E. Topol,et al.  The personal and clinical utility of polygenic risk scores , 2018, Nature Reviews Genetics.

[20]  Mary E. Haas,et al.  Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations , 2018, Nature Genetics.

[21]  Jamie R. Robinson,et al.  Defining Phenotypes from Clinical Data to Drive Genomic Research. , 2018, Annual review of biomedical data science.

[22]  R. Hubbard,et al.  Inflation of type I error rates due to differential misclassification in EHR‐derived outcomes: Empirical illustration using breast cancer recurrence , 2018, Pharmacoepidemiology and drug safety.

[23]  Xinyuan Zhang,et al.  Detecting potential pleiotropy across cardiovascular and neurological diseases using univariate, bivariate, and multivariate methods on 43,870 individuals from the eMERGE network , 2018, PSB.

[24]  Matthew S. Lebo,et al.  Polygenic Prediction of Weight and Obesity Trajectories from Birth to Adulthood , 2019, Cell.

[25]  A. Malhotra,et al.  Schizophrenia Polygenic Risk Score as a Predictor of Antipsychotic Efficacy in First-Episode Psychosis. , 2019, The American journal of psychiatry.

[26]  Jason H. Moore,et al.  A regression framework to uncover pleiotropy in large-scale electronic health record data , 2019, J. Am. Medical Informatics Assoc..

[27]  Michael D. Edge,et al.  Interpreting polygenic scores, polygenic adaptation, and human phenotypic differences , 2018, Evolution, medicine, and public health.

[28]  Jason H. Moore,et al.  Integration of genetic and clinical information to improve imputation of data missing from electronic health records , 2019, J. Am. Medical Informatics Assoc..