Explicit modeling of ancestry improves polygenic risk scores and BLUP prediction

Polygenic prediction using genome-wide SNPs can provide high prediction accuracy for complex traits. Here, we investigate the question of how to account for genetic ancestry when conducting polygenic prediction. We show that the accuracy of polygenic prediction in structured populations may be partly due to genetic ancestry. However, we hypothesized that explicitly modeling ancestry could improve polygenic prediction accuracy. We analyzed three GWAS of hair color, tanning ability and basal cell carcinoma (BCC) in European Americans (sample size from 7,440 to 9,822) and considered two widely used polygenic prediction approaches: polygenic risk scores (PRS) and Best Linear Unbiased Prediction (BLUP). We compared polygenic prediction without correction for ancestry to polygenic prediction with ancestry as a separate component in the model. In 10-fold cross-validation using the PRS approach, the R2 for hair color increased by 66% (0.0456 to 0.0755; p<10−16), the R2 for tanning ability increased by % (0.0154 to 0.0344; p<10−16) and the liability-scale R2 for BCC increased by 68% (0.0138 to 0.0232; p<10−16) when explicitly modeling ancestry, which prevents ancestry effects from entering into each SNP effect and being over-weighted. Surprisingly, explicitly modeling ancestry produces a similar improvement when using the BLUP approach, which fits all SNPs simultaneously in a single variance component and causes ancestry to be under-weighted. We validate our findings via simulations, which show that the differences in prediction accuracy will increase in magnitude as sample sizes increase. In summary, our results show that explicitly modeling ancestry can be important in both PRS and BLUP prediction.

[1]  C. Spencer,et al.  A contribution of novel CNVs to schizophrenia from a genome-wide study of 41,321 subjects: CNV Analysis Group and the Schizophrenia Working Group of the Psychiatric Genomics Consortium , 2016, bioRxiv.

[2]  Ross M. Fraser,et al.  Defining the role of common variation in the genomic and biological architecture of adult human height , 2014, Nature Genetics.

[3]  C. Spencer,et al.  Biological Insights From 108 Schizophrenia-Associated Genetic Loci , 2014, Nature.

[4]  P. Visscher,et al.  Advantages and pitfalls in the application of mixed-model association methods , 2014, Nature Genetics.

[5]  P. Visscher,et al.  Pitfalls of predicting complex traits from SNPs , 2013, Nature Reviews Genetics.

[6]  R. Fernando,et al.  Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor , 2013, PLoS genetics.

[7]  J E Pryce,et al.  Accuracy of prediction of genomic breeding values for residual feed intake and carcass and meat quality traits in Bos taurus, Bos indicus, and composite beef cattle. , 2013, Journal of animal science.

[8]  R. Fernando,et al.  Genomic BLUP Decoded: A Look into the Black Box of Genomic Prediction , 2013, Genetics.

[9]  Jonathan P. Beauchamp,et al.  GWAS of 126,559 Individuals Identifies Genetic Variants Associated with Educational Attainment , 2013, Science.

[10]  H. Hakonarson,et al.  Large sample size, wide variant spectrum, and advanced machine-learning technique boost risk prediction for inflammatory bowel disease. , 2013, American journal of human genetics.

[11]  Chia-Yen Chen,et al.  Improved ancestry inference using weights from external reference panels , 2013, Bioinform..

[12]  M. Daly,et al.  Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis , 2013, The Lancet.

[13]  Nilanjan Chatterjee,et al.  Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies , 2013, Nature Genetics.

[14]  F. Dudbridge Power and Predictive Accuracy of Polygenic Risk Scores , 2013, PLoS genetics.

[15]  B. Hayes,et al.  Comparison of heritabilities of dairy traits in Australian Holstein-Friesian cattle from genomic and pedigree data and implications for genomic evaluations. , 2013, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[16]  Xiang Zhou,et al.  Polygenic Modeling with Bayesian Sparse Linear Mixed Models , 2012, PLoS genetics.

[17]  D. Allison,et al.  A Comprehensive Genetic Approach for Improving Prediction of Skin Cancer Risk in Humans , 2012, Genetics.

[18]  D. Absher,et al.  Genome-Wide Association Studies of Quantitatively Measured Skin, Hair, and Eye Pigmentation in Four European Populations , 2012, PloS one.

[19]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[20]  Gustavo de los Campos,et al.  Inferences from Genomic Models in Stratified Populations , 2012, Genetics.

[21]  Cameron D. Palmer,et al.  Evidence of widespread selection on standing variation in Europe at height-associated SNPs , 2012, Nature Genetics.

[22]  B. Hayes,et al.  Comparison of heritabilities of dairy traits in Australian Holstein-Friesian cattle from genomic and pedigree data and implications for genomic evaluations: Implication of genomic heritability for genomic evaluation , 2012 .

[23]  Peter Kraft,et al.  Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis , 2012, Nature Genetics.

[24]  P. Visscher,et al.  A Better Coefficient of Determination for Genetic Profile Analysis , 2012, Genetic epidemiology.

[25]  Jennifer Mulle,et al.  A Genome-Wide Scan of Ashkenazi Jewish Crohn's Disease Suggests Novel Susceptibility Loci , 2012, PLoS genetics.

[26]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[27]  Jeffrey E. Lee,et al.  Genome-wide association study identifies novel alleles associated with risk of cutaneous basal cell carcinoma and squamous cell carcinoma , 2022 .

[28]  Peter Kraft,et al.  Evaluation of polygenic risk scores for predicting breast and prostate cancer risk , 2011, Genetic epidemiology.

[29]  Johnny S. H. Kwan,et al.  Risk prediction of complex diseases from family history and known susceptibility loci, with applications for cancer screening. , 2011, American journal of human genetics.

[30]  D. Allison,et al.  Beyond Missing Heritability: Prediction of Complex Traits , 2011, PLoS genetics.

[31]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[32]  Daniel Gianola,et al.  Predicting genetic predisposition in humans: the promise of whole-genome markers , 2010, Nature Reviews Genetics.

[33]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[34]  Jianxin Shi,et al.  Genome‐wide association studies of pigmentation and skin cancer: a review and meta‐analysis , 2010, Pigment cell & melanoma research.

[35]  Ayellet V. Segrè,et al.  Hundreds of variants clustered in genomic loci and biological pathways affect human height , 2010, Nature.

[36]  Peter Kraft,et al.  Genetic variants at 2q24 are associated with susceptibility to type 2 diabetes. , 2010, Human molecular genetics.

[37]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[38]  William S Bush,et al.  Evidence for polygenic susceptibility to multiple sclerosis--the shape of things to come. , 2010, American journal of human genetics.

[39]  Peter Kraft,et al.  Genetic risk prediction--are we there yet? , 2009, The New England journal of medicine.

[40]  Joseph T. Glessner,et al.  From Disease Association to Risk Assessment: An Optimistic View from Genome-Wide Association Studies on Type 1 Diabetes , 2009, PLoS genetics.

[41]  Peter M Visscher,et al.  Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk. , 2009, Human molecular genetics.

[42]  Peter Kraft,et al.  Genome-wide association study of tanning phenotype in a population of European ancestry. , 2009, The Journal of investigative dermatology.

[43]  P. Visscher,et al.  Common polygenic variation contributes to risk of schizophrenia and bipolar disorder , 2009, Nature.

[44]  A. Green,et al.  Incidence trends for childhood type 1 diabetes in Europe during 1989–2003 and predicted new cases 2005–20: a multicentre prospective registration study , 2009, The Lancet.

[45]  D. Hunter,et al.  Genetic variants in pigmentation genes, pigmentary phenotypes, and risk of skin cancer in Caucasians , 2009, International journal of cancer.

[46]  R. D'Agostino,et al.  Genotype score in addition to common risk factors for prediction of type 2 diabetes. , 2008, The New England journal of medicine.

[47]  Hans D. Daetwyler,et al.  Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach , 2008, PloS one.

[48]  John Novembre,et al.  The Population Reference Sample, POPRES: a resource for population, disease, and pharmacological genetics research. , 2008, American journal of human genetics.

[49]  F. Hu,et al.  A Genome-Wide Association Study Identifies Novel Alleles Associated with Hair Color and Skin Pigmentation , 2008, PLoS genetics.

[50]  G. Curhan,et al.  24-h uric acid excretion and the risk of kidney stones. , 2008, Kidney international.

[51]  David Reich,et al.  Discerning the Ancestry of European Americans in Genetic Association Studies , 2007, PLoS genetics.

[52]  Snæbjörn Pálsson,et al.  Genetic determinants of hair, eye and skin pigmentation in Europeans , 2007, Nature Genetics.

[53]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[54]  Peter M Visscher,et al.  Prediction of individual genetic risk to disease from genome-wide association studies. , 2007, Genome research.

[55]  W. Willett,et al.  A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer , 2007, Nature Genetics.

[56]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[57]  G. Colditz,et al.  Melanocortin 1 receptor variants and skin cancer risk , 2006, International journal of cancer.

[58]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[59]  M. Stampfer,et al.  Obesity, weight gain, and the risk of kidney stones. , 2005, JAMA.

[60]  Shifts in angiotensin I converting enzyme insertion allele frequency across Europe: implications for Alzheimer’s disease risk , 2003, Journal of neurology, neurosurgery, and psychiatry.

[61]  D. Kuijpers,et al.  Basal Cell Carcinoma , 2002, American journal of clinical dermatology.

[62]  G. Rosati,et al.  The prevalence of multiple sclerosis in the world: an update , 2001, Neurological Sciences.

[63]  B Modell,et al.  Global Epidemiology of Hemoglobin Disorders , 1998, Annals of the New York Academy of Sciences.

[64]  S. Accardo,et al.  Prevalence of rheumatoid arthritis in Italy: the Chiavari study , 1998, Annals of the rheumatic diseases.

[65]  H. Prydz,et al.  Contribution of factor VII genotype to activated FVII levels. Differences in genotype frequencies between northern and southern European populations. , 1997, Arteriosclerosis, thrombosis, and vascular biology.

[66]  E. Rimm,et al.  Prospective study of alcohol consumption and risk of coronary disease in men , 1991, The Lancet.

[67]  W. Willett,et al.  Risk factors for basal cell carcinoma in a prospective cohort of women. , 1990, Annals of epidemiology.

[68]  C. R. Henderson,et al.  Best linear unbiased estimation and prediction under a selection model. , 1975, Biometrics.