Fast and powerful genome wide association of dense genetic data with high dimensional imaging phenotypes

Genome wide association (GWA) analysis of brain imaging phenotypes can advance our understanding of the genetic basis of normal and disorder-related variation in the brain. GWA approaches typically use linear mixed effect models to account for non-independence amongst subjects due to factors, such as family relatedness and population structure. The use of these models with high-dimensional imaging phenotypes presents enormous challenges in terms of computational intensity and the need to account multiple testing in both the imaging and genetic domain. Here we present a method that makes mixed models practical with high-dimensional traits by a combination of a transformation applied to the data and model, and the use of a non-iterative variance component estimator. With such speed enhancements permutation tests are feasible, which allows inference on powerful spatial tests like the cluster size statistic.Genome-wide association studies (GWAS) of neuroimaging data pose a significant computational burden because of the need to correct for multiple testing in both the genetic and the imaging data. Here, Ganjgahi et al. develop WLS-REML which significantly reduces computation running times in brain imaging GWAS.

[1]  Thomas E. Nichols,et al.  Controlling the familywise error rate in functional neuroimaging: a comparative review , 2003, Statistical methods in medical research.

[2]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[3]  Thomas E. Nichols,et al.  Common genetic variants influence human subcortical brain structures , 2015, Nature.

[4]  Paolo Bientinesi,et al.  Computing Petaflops over Terabytes of Data , 2012, ACM Trans. Math. Softw..

[5]  S J Hasstedt,et al.  A mixed-model likelihood approximation on large pedigrees. , 1982, Computers and biomedical research, an international journal.

[6]  L. Almasy,et al.  Multipoint quantitative-trait linkage analysis in general pedigrees. , 1998, American journal of human genetics.

[7]  Andrew J. Saykin,et al.  Voxelwise genome-wide association study (vGWAS) , 2010, NeuroImage.

[8]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[9]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[10]  Marisa O. Hollinshead,et al.  Identification of common variants associated with human hippocampal and intracranial volumes , 2012, Nature Genetics.

[11]  J. Marchini,et al.  Genome-wide association studies of brain structure and function 1 in the UK Biobank 2 , 2018 .

[12]  R. Cheng,et al.  A Simulation Study of Permutation, Bootstrap, and Gene Dropping for Assessing Statistical Significance in the Case of Unequal Relatedness , 2013, Genetics.

[13]  Ying Liu,et al.  FaST linear mixed models for genome-wide association studies , 2011, Nature Methods.

[14]  D. Balding A tutorial on statistical methods for population association studies , 2006, Nature Reviews Genetics.

[15]  Thomas E. Nichols,et al.  Nonparametric permutation tests for functional neuroimaging: A primer with examples , 2002, Human brain mapping.

[16]  P. Donnelly,et al.  Association mapping in structured populations. , 2000, American journal of human genetics.

[17]  Mark Abney,et al.  Permutation Testing in the Presence of Polygenic Variation , 2015, bioRxiv.

[18]  Paul M. Thompson,et al.  Multi-site genetic analysis of diffusion images and voxelwise heritability analysis: A pilot project of the ENIGMA–DTI working group , 2013, NeuroImage.

[19]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[20]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[21]  Stephen M. Smith,et al.  Threshold-free cluster enhancement: Addressing problems of smoothing, threshold dependence and localisation in cluster inference , 2009, NeuroImage.

[22]  Michael Weiner,et al.  Genome-wide analysis reveals novel genes influencing temporal lobe structure with relevance to neurodegeneration in Alzheimer's disease , 2010, NeuroImage.

[23]  Stacey S. Cherny,et al.  Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets , 2011, Human Genetics.

[24]  M. Stephens,et al.  Imputation-Based Analysis of Association Studies: Candidate Regions and Quantitative Traits , 2007, PLoS genetics.

[25]  David Heckerman,et al.  FaST-LMM-Select for addressing confounding from spatial structure and rare variants , 2013, Nature Genetics.

[26]  Calyampudi R. Rao,et al.  Linear Statistical Inference and Its Applications. , 1975 .

[27]  E. Boerwinkle,et al.  The use of measured genotype information in the analysis of quantitative phenotypes in man , 1986, Annals of human genetics.

[28]  Daniel Mathalon,et al.  A genome-wide association study of schizophrenia using brain activation as a quantitative phenotype. , 2009, Schizophrenia bulletin.

[29]  K. Lange,et al.  Extensions to pedigree analysis III. Variance components by the scoring method , 1976, Annals of human genetics.

[30]  Stephen M. Smith,et al.  Permutation inference for the general linear model , 2014, NeuroImage.

[31]  M. McMullen,et al.  A unified mixed-model method for association mapping that accounts for multiple levels of relatedness , 2006, Nature Genetics.

[32]  John Blangero,et al.  A kernel of truth: statistical advances in polygenic variance component models for complex human pedigrees. , 2013, Advances in genetics.

[33]  Zhiwu Zhang,et al.  Mixed linear model approach adapted for genome-wide association studies , 2010, Nature Genetics.

[34]  Paul M. Thompson,et al.  Increasing power for voxel-wise genome-wide association studies: The random field theory, least square kernel machines and fast permutation procedures , 2012, NeuroImage.

[35]  Andrew J. Saykin,et al.  Hippocampal Atrophy as a Quantitative Trait in a Genome-Wide Association Study Identifying Novel Susceptibility Genes for Alzheimer's Disease , 2009, PloS one.

[36]  D. Heckerman,et al.  Efficient Control of Population Structure in Model Organism Association Mapping , 2008, Genetics.

[37]  P. Thompson,et al.  Neuroimaging endophenotypes: Strategies for finding genes influencing brain structure and function , 2007, Human brain mapping.

[38]  Eleazar Eskin,et al.  Improved linear mixed models for genome-wide association studies , 2012, Nature Methods.

[39]  Thomas E. Nichols,et al.  Fast and powerful heritability inference for family-based neuroimaging studies , 2015, NeuroImage.

[40]  Takeshi Amemiya,et al.  A note on a heteroscedastic model , 1977 .

[41]  J. Mathews,et al.  Extensions to multivariate normal models for pedigree analysis , 1982, Annals of human genetics.

[42]  D. Heckerman,et al.  Further Improvements to Linear Mixed Models for Genome-Wide Association Studies , 2014, Scientific Reports.

[43]  L. Cardon,et al.  Population stratification and spurious allelic association , 2003, The Lancet.

[44]  R. Kahn,et al.  Genetic influences on human brain structure: A review of brain imaging studies in twins , 2007, Human brain mapping.

[45]  N. Schork,et al.  Testing the robustness of the likelihood-ratio test in a variance-component quantitative-trait loci-mapping procedure. , 1999, American journal of human genetics.

[46]  B. Guldbrandtsen,et al.  Comparison of Genome-Wide Association Methods in Analyses of Admixed Populations with Complex Familial Relationships , 2014, PloS one.

[47]  Birgir Hrafnkelsson,et al.  An Icelandic example of the impact of population structure on association studies , 2005, Nature Genetics.

[48]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[49]  Matti Pirinen,et al.  Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies , 2012, 1207.4886.

[50]  F Alfaro Almagro The genetic basis of human brain structure and function: 1,262 genome-wide associations found from 3,144 GWAS of multimodal brain imaging phenotypes from 9,707 UK Biobank participants , 2017 .

[51]  Tatiana I Axenovich,et al.  Rapid variance components–based method for whole-genome association analysis , 2012, Nature Genetics.

[52]  J. Pritchard,et al.  Confounding from Cryptic Relatedness in Case-Control Association Studies , 2005, PLoS genetics.

[53]  Daniel Rueckert,et al.  Tract-based spatial statistics: Voxelwise analysis of multi-subject diffusion data , 2006, NeuroImage.

[54]  Alex Pothen,et al.  ColPack: Software for graph coloring and related problems in scientific computing , 2013, TOMS.

[55]  David Heckerman,et al.  Greater power and computational efficiency for kernel-based association testing of sets of genetic variants , 2014, Bioinform..

[56]  E. S. Pearson,et al.  THE USE OF CONFIDENCE OR FIDUCIAL LIMITS ILLUSTRATED IN THE CASE OF THE BINOMIAL , 1934 .

[57]  Thomas E. Nichols,et al.  Heterochronicity of white matter development and aging explains regional patient control differences in schizophrenia , 2016, Human brain mapping.

[58]  Karl J. Friston,et al.  Assessing the significance of focal activations using their spatial extent , 1994, Human brain mapping.

[59]  Richard G. F. Visser,et al.  Meiosis Drives Extraordinary Genome Plasticity in the Haploid Fungal Plant Pathogen Mycosphaerella graminicola , 2009, PloS one.

[60]  Amanda B. Hepler,et al.  Genetic relatedness analysis: modern data and new challenges , 2006, Nature Reviews Genetics.

[61]  Hans Knutsson,et al.  Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates , 2016, Proceedings of the National Academy of Sciences.

[62]  P. Visscher,et al.  Advantages and pitfalls in the application of mixed-model association methods , 2014, Nature Genetics.