Comparison of Methods to Account for Relatedness in Genome-Wide Association Studies with Family-Based Data

Approaches based on linear mixed models (LMMs) have recently gained popularity for modelling population substructure and relatedness in genome-wide association studies. In the last few years, a bewildering variety of different LMM methods/software packages have been developed, but it is not always clear how (or indeed whether) any newly-proposed method differs from previously-proposed implementations. Here we compare the performance of several LMM approaches (and software implementations, including EMMAX, GenABEL, FaST-LMM, Mendel, GEMMA and MMM) via their application to a genome-wide association study of visceral leishmaniasis in 348 Brazilian families comprising 3626 individuals (1972 genotyped). The implementations differ in precise details of methodology implemented and through various user-chosen options such as the method and number of SNPs used to estimate the kinship (relatedness) matrix. We investigate sensitivity to these choices and the success (or otherwise) of the approaches in controlling the overall genome-wide error-rate for both real and simulated phenotypes. We compare the LMM results to those obtained using traditional family-based association tests (based on transmission of alleles within pedigrees) and to alternative approaches implemented in the software packages MQLS, ROADTRIPS and MASTOR. We find strong concordance between the results from different LMM approaches, and all are successful in controlling the genome-wide error rate (except for some approaches when applied naively to longitudinal data with many repeated measures). We also find high correlation between LMMs and alternative approaches (apart from transmission-based approaches when applied to SNPs with small or non-existent effects). We conclude that LMM approaches perform well in comparison to competing approaches. Given their strong concordance, in most applications, the choice of precise LMM implementation cannot be based on power/type I error considerations but must instead be based on considerations such as speed and ease-of-use.

[1]  Mary Sara McPeek,et al.  MASTOR: mixed-model association mapping of quantitative traits in samples with related individuals. , 2013, American journal of human genetics.

[2]  M. Pirinen,et al.  Common variants in the HLA-DRB1-HLA-DQA1 Class II region are associated with susceptibility to visceral leishmaniasis , 2013, Nature Genetics.

[3]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[4]  Doug Speed,et al.  Improved heritability estimation from genome-wide SNPs. , 2012, American journal of human genetics.

[5]  E. Martin,et al.  A test for linkage and association in general pedigrees: the pedigree disequilibrium test. , 2000, American journal of human genetics.

[6]  E. Boerwinkle,et al.  The use of measured genotype information in the analysis of quantitative phenotypes in man , 1986, Annals of human genetics.

[7]  H. Cordell,et al.  Accounting for relatedness in family-based association studies: application to Genetic Analysis Workshop 18 data , 2014, BMC Proceedings.

[8]  Daniel Rabinowitz,et al.  A Unified Approach to Adjusting Association Tests for Population Admixture with Arbitrary Pedigree Structure and Arbitrary Missing Marker Information , 2000, Human Heredity.

[9]  Ying Liu,et al.  FaST linear mixed models for genome-wide association studies , 2011, Nature Methods.

[10]  R. Fisher XV.—The Correlation between Relatives on the Supposition of Mendelian Inheritance. , 1919, Transactions of the Royal Society of Edinburgh.

[11]  G. Abecasis,et al.  Merlin—rapid analysis of dense genetic maps using sparse gene flow trees , 2002, Nature Genetics.

[12]  Frank Dudbridge,et al.  Likelihood-Based Association Analysis for Nuclear Families and Unrelated Subjects with Missing Genotype Data , 2008, Human Heredity.

[13]  N. Laird,et al.  The family based association test method: strategies for studying general genotype–phenotype associations , 2001, European Journal of Human Genetics.

[14]  Ou Ziqiang,et al.  Estimation of variance and covariance components , 1989 .

[15]  Matti Pirinen,et al.  Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies , 2012, 1207.4886.

[16]  Christoph Lange,et al.  PBAT: tools for family-based association studies. , 2004, American journal of human genetics.

[17]  Kai Wang,et al.  An Analytical Comparison of the Principal Component Method and the Mixed Effects Model for Association Studies in the Presence of Cryptic Relatedness and Population Stratification , 2013, Human Heredity.

[18]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[19]  M. McPeek,et al.  Quantitative-trait homozygosity and association mapping and empirical genomewide significance in large, complex pedigrees: fasting serum-insulin level in the Hutterites. , 2002, American journal of human genetics.

[20]  Christian Fuchsberger,et al.  Data for Genetic Analysis Workshop 18: human whole genome sequence, blood pressure, and simulated phenotypes in extended pedigrees , 2014, BMC Proceedings.

[21]  Mary Sara McPeek,et al.  ROADTRIPS: case-control association testing with partially or completely unknown population and pedigree structure. , 2010, American journal of human genetics.

[22]  William J. Astle,et al.  Population Structure and Cryptic Relatedness in Genetic Association Studies , 2009, 1010.4681.

[23]  P. Visscher,et al.  Reconciling the analysis of IBD and IBS in complex trait studies , 2010, Nature Reviews Genetics.

[24]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[25]  Tatiana I Axenovich,et al.  Rapid variance components–based method for whole-genome association analysis , 2012, Nature Genetics.

[26]  Yurii S. Aulchenko,et al.  BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btm108 Genetics and population analysis GenABEL: an R library for genome-wide association analysis , 2022 .

[27]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[28]  M. McMullen,et al.  A unified mixed-model method for association mapping that accounts for multiple levels of relatedness , 2006, Nature Genetics.

[29]  Simon C. Potter,et al.  Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis , 2011, Nature.

[30]  Hua Zhou,et al.  Mendel: the Swiss army knife of genetic analysis programs , 2013, Bioinform..

[31]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[32]  Eleazar Eskin,et al.  Improved linear mixed models for genome-wide association studies , 2012, Nature Methods.

[33]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[34]  David Heckerman,et al.  The benefits of selecting phenotype-specific variants for applications of mixed models in genomics , 2013, Scientific Reports.

[35]  P. Holmans,et al.  A Flexible Model for Association Analysis in Sibships with Missing Genotype Data , 2011, Annals of Human Genetics.

[36]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[37]  Eleazar Eskin,et al.  Genome‐Wide Association Mapping With Longitudinal Data , 2012, Genetic epidemiology.

[38]  T. Beaty,et al.  Genetic Admixture in Brazilians Exposed to Infection with Leishmania chagasi , 2009, Annals of human genetics.

[39]  C. Haley,et al.  Genomewide Rapid Association Using Mixed Model and Regression: A Fast and Simple Method For Genomewide Pedigree-Based Quantitative Trait Loci Association Analysis , 2007, Genetics.

[40]  D. Heckerman,et al.  Efficient Control of Population Structure in Model Organism Association Mapping , 2008, Genetics.

[41]  Xin Xu,et al.  Implementing a unified approach to family‐based tests of association , 2000, Genetic epidemiology.

[42]  Yurii S. Aulchenko,et al.  A Genomic Background Based Method for Association Analysis in Related Individuals , 2007, PloS one.

[43]  G. Abecasis,et al.  Family-based association tests for genomewide association scans. , 2007, American journal of human genetics.

[44]  T. Thornton,et al.  Case-control association testing with related individuals: a more powerful quasi-likelihood score test. , 2007, American journal of human genetics.

[45]  W. Ewens,et al.  Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). , 1993, American journal of human genetics.

[46]  Josyf Mychaleckyj,et al.  Robust relationship inference in genome-wide association studies , 2010, Bioinform..

[47]  N M Laird,et al.  Family-based tests of association in the presence of linkage. , 2000, American journal of human genetics.

[48]  L. Penrose,et al.  THE CORRELATION BETWEEN RELATIVES ON THE SUPPOSITION OF MENDELIAN INHERITANCE , 2022 .

[49]  Zhiwu Zhang,et al.  Mixed linear model approach adapted for genome-wide association studies , 2010, Nature Genetics.