EigenGWAS: finding loci under selection through genome-wide association studies of eigenvectors in structured populations

We develop a novel approach to identify regions of the genome underlying population genetic differentiation in any genetic data where the underlying population structure is unknown, or where the interest is assessing divergence along a gradient. By combining the statistical framework for genome-wide association studies (GWASs) with eigenvector decomposition (EigenGWAS), which is commonly used in population genetics to characterize the structure of genetic data, loci under selection can be identified without a requirement for discrete populations. We show through theory and simulation that our approach can identify regions under selection along gradients of ancestry, and in real data we confirm this by demonstrating LCT to be under selection between HapMap CEU–TSI cohorts, and we then validate this selection signal across European countries in the POPRES samples. HERC2 was also found to be differentiated between both the CEU–TSI cohort and within the POPRES sample, reflecting the likely anthropological differences in skin and hair colour between northern and southern European populations. Controlling for population stratification is of great importance in any quantitative genetic study and our approach also provides a simple, fast and accurate way of predicting principal components in independent samples. With ever increasing sample sizes across many fields, this approach is likely to be greatly utilized to gain individual-level eigenvectors avoiding the computational challenges associated with conducting singular value decomposition in large data sets. We have developed freely available software, Genetic Analysis Repository (GEAR), to facilitate the application of the methods.

[1]  C. R. Henderson,et al.  Best linear unbiased estimation and prediction under a selection model. , 1975, Biometrics.

[2]  J. Marchini,et al.  Genotype Imputation with Thousands of Genomes , 2011, G3: Genes | Genomes | Genetics.

[3]  Pardis C Sabeti,et al.  Genetic signatures of strong recent positive selection at the lactase gene. , 2004, American journal of human genetics.

[4]  John Novembre,et al.  The Population Reference Sample, POPRES: a resource for population, disease, and pharmacological genetics research. , 2008, American journal of human genetics.

[5]  Alberto Piazza,et al.  The History and Geography of Human Genes: Abridged paperback Edition , 1996 .

[6]  G. McVean A Genealogical Interpretation of Principal Components Analysis , 2009, PLoS genetics.

[7]  Sayan Mukherjee,et al.  Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia. , 2016, American journal of human genetics.

[8]  Mark Tygert,et al.  A Randomized Algorithm for Principal Component Analysis , 2008, SIAM J. Matrix Anal. Appl..

[9]  Mark I McCarthy,et al.  Genomic inflation factors under polygenic inheritance , 2011, European Journal of Human Genetics.

[10]  P. Visscher,et al.  Common polygenic variation contributes to risk of schizophrenia and bipolar disorder , 2009, Nature.

[11]  Amit R. Indap,et al.  Genes mirror geography within Europe , 2008, Nature.

[12]  N. Risch,et al.  A comparison of linkage disequilibrium measures for fine-scale mapping. , 1995, Genomics.

[13]  P. Lockhart,et al.  Mutations in DARS cause hypomyelination with brain stem and spinal cord involvement and leg spasticity. , 2013, American journal of human genetics.

[14]  Alkes L. Price,et al.  Reconstructing Indian Population History , 2009, Nature.

[15]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[16]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[17]  Guo-Bo Chen,et al.  Estimating heritability of complex traits from genome-wide association studies using IBS-based Haseman–Elston regression , 2014, Front. Genet..

[18]  Robert-Jan Palstra,et al.  HERC2 rs12913832 modulates human pigmentation by attenuating chromatin-loop formation between a long-range enhancer and the OCA2 promoter. , 2012, Genome research.

[19]  Xiaofeng Zhu,et al.  A unified association analysis approach for family and unrelated samples correcting for stratification. , 2008, American journal of human genetics.

[20]  F. Dudbridge Power and Predictive Accuracy of Polygenic Risk Scores , 2013, PLoS genetics.

[21]  S WRIGHT,et al.  Genetical Structure of Populations , 1950, British medical journal.

[22]  Naomi R. Wray,et al.  Estimating Effects and Making Predictions from Genome-Wide Marker Data , 2010, 1010.4710.

[23]  Chia-Yen Chen,et al.  Improved ancestry inference using weights from external reference panels , 2013, Bioinform..

[24]  Bruce S. Weir Genetic Data Analysis , 1990 .

[25]  M. Nei Analysis of gene diversity in subdivided populations. , 1973, Proceedings of the National Academy of Sciences of the United States of America.

[26]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[27]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[28]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[29]  Sayan Mukherjee,et al.  Fast principal components analysis reveals convergent evolution of ADH1B gene in Europe and East Asia , 2015, bioRxiv.

[30]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[31]  David Reich,et al.  Phasing of many thousands of genotyped samples. , 2012, American journal of human genetics.

[32]  D. F. Roberts,et al.  The History and Geography of Human Genes , 1996 .

[33]  A. de la Chapelle,et al.  A large pericentric inversion of human chromosome 8. , 1976, American journal of human genetics.

[34]  H. Grüneberg,et al.  Introduction to quantitative genetics , 1960 .

[35]  Laura J. Scott,et al.  Joint Analysis of Psychiatric Disorders Increases Accuracy of Risk Prediction for Schizophrenia, Bipolar Disorder, and Major Depressive Disorder , 2015, American journal of human genetics.

[36]  Pablo Villoslada,et al.  Analysis and Application of European Genetic Substructure Using 300 K SNP Information , 2008, PLoS genetics.

[37]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[38]  J. W. Silverstein,et al.  Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete subpopulations. , 2013, Theoretical population biology.

[39]  J. Pritchard,et al.  A Map of Recent Positive Selection in the Human Genome , 2006, PLoS biology.

[40]  Hans D. Daetwyler,et al.  Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach , 2008, PloS one.