A Rapid Generalized Least Squares Model for a Genome-Wide Quantitative Trait Association Analysis in Families

Genome-wide association studies (GWAS) using family data involve association analyses between hundreds of thousands of markers and a trait for a large number of related individuals. The correlations among relatives bring statistical and computational challenges when performing these large-scale association analyses. Recently, several rapid methods accounting for both within- and between-family variation have been proposed. However, these techniques mostly model the phenotypic similarities in terms of genetic relatedness. The familial resemblances in many family-based studies such as twin studies are not only due to the genetic relatedness, but also derive from shared environmental effects and assortative mating. In this paper, we propose 2 generalized least squares (GLS) models for rapid association analysis of family-based GWAS, which accommodate both genetic and environmental contributions to familial resemblance. In our first model, we estimated the joint genetic and environmental variations. In our second model, we estimated the genetic and environmental components separately. Through simulation studies, we demonstrated that our proposed approaches are more powerful and computationally efficient than a number of existing methods are. We show that estimating the residual variance-covariance matrix in the GLS models without SNP effects does not lead to an appreciable bias in the p values as long as the SNP effect is small (i.e. accounting for no more than 1% of trait variance).

[1]  G. Abecasis,et al.  Family-based association tests for genomewide association scans. , 2007, American journal of human genetics.

[2]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[3]  G. Abecasis,et al.  A general test of association for quantitative traits in nuclear families. , 2000, American journal of human genetics.

[4]  Simo Puntanen,et al.  The Equality of the Ordinary Least Squares Estimator and the Best Linear Unbiased Estimator , 1989 .

[5]  L. Magee,et al.  A Variance Comparison of OLS and Feasible GLS Estimators , 1988, Econometric Theory.

[6]  J. Blangero,et al.  BioMed Central , 2001 .

[7]  S E Poduslo,et al.  Genome screen of late‐onset Alzheimer's extended pedigrees identifies TRPC4AP by haplotype analysis , 2009, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[8]  Judy H. Cho,et al.  A Genome-Wide Association Study Identifies IL23R as an Inflammatory Bowel Disease Gene , 2006, Science.

[9]  Søren Højsgaard,et al.  The R Package geepack for Generalized Estimating Equations , 2005 .

[10]  S. Finch,et al.  Gain in efficiency from using generalized least squares in the Haseman‐Elston test , 1995, Genetic epidemiology.

[11]  J. Mathews,et al.  Extensions to multivariate normal models for pedigree analysis. II. Modeling the effect of shared environment in the analysis of variation in blood lead levels. , 1983, American journal of epidemiology.

[12]  G. Abecasis,et al.  Genome-wide association scan for five major dimensions of personality , 2010, Molecular Psychiatry.

[13]  Ayellet V. Segrè,et al.  Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis , 2010, Nature Genetics.

[14]  James Strait,et al.  Genome-Wide Association Scan Shows Genetic Variants in the FTO Gene Are Associated with Obesity-Related Traits , 2007, PLoS genetics.

[15]  Jason Fine,et al.  Estimating equations for association structures , 2004, Statistics in medicine.

[16]  P. Visscher,et al.  Family-based genome-wide association studies. , 2009, Pharmacogenomics.

[17]  Sudha Seshadri,et al.  The Framingham Heart Study 100K SNP genome-wide association study resource: overview of 17 phenotype working group reports , 2007, BMC Medical Genetics.

[18]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[19]  Warren J. Ewens,et al.  A Review of Family-Based Tests for Linkage Disequilibrium between a Quantitative Trait and a Genetic Marker , 2008, PLoS genetics.

[20]  The longitudinal nonparametric test as a new tool to explore gene‐gene and gene‐time effects in cohorts , 2010, Genetic epidemiology.

[21]  C. Haley,et al.  Genomewide Rapid Association Using Mixed Model and Regression: A Fast and Simple Method For Genomewide Pedigree-Based Quantitative Trait Loci Association Analysis , 2007, Genetics.

[22]  Xin Xu,et al.  Implementing a unified approach to family‐based tests of association , 2000, Genetic epidemiology.

[23]  L. Cardon,et al.  Population stratification and spurious allelic association , 2003, The Lancet.

[24]  Matt McGue,et al.  Psychometric and Genetic Architecture of Substance Use Disorder and Behavioral Disinhibition Measures for Gene Association Studies , 2011, Behavior genetics.

[25]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[26]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[27]  E. Boerwinkle,et al.  The use of measured genotype information in the analysis of quantitative phenotypes in man , 1986, Annals of human genetics.

[28]  S. Cichon,et al.  A genome-wide association study implicates diacylglycerol kinase eta (DGKH) and several other genes in the etiology of bipolar disorder , 2008, Molecular Psychiatry.

[29]  Ming-Huei Chen,et al.  GWAF: an R package for genome-wide association analyses with family data , 2010, Bioinform..

[30]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[31]  J K Hewitt,et al.  Combined linkage and association sib-pair analysis for quantitative traits. , 1999, American journal of human genetics.

[32]  D. Bates,et al.  Newton-Raphson and EM Algorithms for Linear Mixed-Effects Models for Repeated-Measures Data , 1988 .

[33]  David Altshuler,et al.  Genetic variants near TNFAIP3 on 6q23 are associated with systemic lupus erythematosus , 2008, Nature Genetics.

[34]  Héctor Corrada Bravo,et al.  Examining the relative influence of familial, genetic, and environmental covariate information in flexible risk models , 2009, Proceedings of the National Academy of Sciences.