A Fast, Accurate Two-Step Linear Mixed Model for Genetic Analysis Applied to Repeat MRI Measurements

Large-scale biobanks are being collected around the world in efforts to better understand human health and risk factors for disease. They often survey hundreds of thousands of individuals, combining questionnaires with clinical, genetic, demographic, and imaging assessments; some of this data may be collected longitudinally. Genetic associations analysis of such datasets requires methods to properly handle relatedness, population structure and other types of biases introduced by confounders. Most popular and accurate approaches rely on linear mixed model (LMM) algorithms, which are iterative and computational complexity of each iteration scales by the square of the sample size, slowing the pace of discoveries (up to several days for single trait analysis), and, furthermore, limiting the use of repeat phenotypic measurements. Here, we describe our new, non-iterative, much faster and accurate Two-Step Linear Mixed Model (Two-Step LMM) approach, that has a computational complexity that scales linearly with sample size. We show that the first step retains accurate estimates of the heritability (the proportion of the trait variance explained by additive genetic factors), even when increasingly complex genetic relationships between individuals are modeled. Second step provides a faster framework to obtain the effect sizes of covariates in regression model. We applied Two-Step LMM to real data from the UK Biobank, which recently released genotyping information and processed MRI data from 9,725 individuals. We used the left and right hippocampus volume (HV) as repeated measures, and observed increased and more accurate heritability estimation, consistent with simulations.

[1]  Jack Kleijnen,et al.  White noise' assumptions revisited: Regression metamodels & experimental design in practice , 2006, Proceedings of the 2006 Winter Simulation Conference.

[2]  Heather J. Cordell,et al.  Comparison of Methods to Account for Relatedness in Genome-Wide Association Studies with Family-Based Data , 2014, PLoS genetics.

[3]  Yurii S. Aulchenko,et al.  ProbABEL package for genome-wide association analysis of imputed data , 2010, BMC Bioinformatics.

[4]  Michael W. Weiner,et al.  Genetic Architecture of Subcortical Brain Structures in Over 40,000 Individuals Worldwide , 2017, bioRxiv.

[5]  David A. Harville,et al.  Extension of the Gauss-Markov Theorem to Include the Estimation of Random Effects , 1976 .

[6]  G. Oehlert A note on the delta method , 1992 .

[7]  R. Murray,et al.  Meta-analysis of regional brain volumes in schizophrenia. , 2000, The American journal of psychiatry.

[8]  Cora J. M. Maas,et al.  Robustness issues in multilevel regression analysis , 2004 .

[9]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[10]  Jun Shao,et al.  Asymptotic distribution of the weighted least squares estimator , 1989 .

[11]  Paul M. Thompson,et al.  Multi-site genetic analysis of diffusion images and voxelwise heritability analysis: A pilot project of the ENIGMA–DTI working group , 2013, NeuroImage.

[12]  Tian Ge,et al.  Phenome-wide heritability analysis of the UK Biobank , 2016, bioRxiv.

[13]  Dayanand N. Naik,et al.  Analysis of multivariate repeated measures data with a Kronecker product structured covariance matrix , 2001 .

[14]  Neda Jahanshad,et al.  Whole-genome analyses of whole-brain data: working within an expanded search space , 2014, Nature Neuroscience.

[15]  F Alfaro Almagro The genetic basis of human brain structure and function: 1,262 genome-wide associations found from 3,144 GWAS of multimodal brain imaging phenotypes from 9,707 UK Biobank participants , 2017 .

[16]  Freda Kemp,et al.  Mathematical and Statistical Methods for Genetic Analysis , 2003 .

[17]  Ying Liu,et al.  FaST linear mixed models for genome-wide association studies , 2011, Nature Methods.

[18]  Bonnie Berger,et al.  Efficient Bayesian mixed model analysis increases association power in large cohorts , 2014 .

[19]  Anbupalam Thalamuthu,et al.  Genetic influences on schizophrenia and subcortical brain volumes: large-scale proof-of-concept and roadmap for future studies , 2016, Nature Neuroscience.

[20]  B. Berger,et al.  Efficient Bayesian mixed model analysis increases association power in large cohorts , 2014, Nature Genetics.

[21]  W. J. Niessen,et al.  HASE: Framework for efficient high-dimensional association analyses , 2016, Scientific Reports.

[22]  Lloyd T. Elliott,et al.  The genetic basis of human brain structure and function: 1,262 genome-wide associations found from 3,144 GWAS of multimodal brain imaging phenotypes from 9,707 UK Biobank participants , 2017, bioRxiv.

[23]  Mert R. Sabuncu,et al.  Heritability analysis with repeat measurements and its application to resting-state functional connectivity , 2017, Proceedings of the National Academy of Sciences.

[24]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[25]  M. Stein,et al.  Hippocampal volume in women victimized by childhood sexual abuse , 1997, Psychological Medicine.

[26]  David M. Nicol,et al.  White noise assumptions revisited : Regression metamodels and experimental designs for simulation practice , 2006 .

[27]  Thomas E. Nichols,et al.  Common genetic variants influence human subcortical brain structures , 2015, Nature.

[28]  P. Matthews,et al.  Multimodal population brain imaging in the UK Biobank prospective epidemiological study , 2016, Nature Neuroscience.

[29]  Kai-Uwe Eckardt,et al.  Novel genetic loci associated with hippocampal , 2017 .

[30]  M. Daly,et al.  LD Score regression distinguishes confounding from polygenicity in genome-wide association studies , 2014, Nature Genetics.