Ancestral haplotype-based association mapping with generalized linear mixed models accounting for stratification

MOTIVATION In many situations, genome-wide association studies are performed in populations presenting stratification. Mixed models including a kinship matrix accounting for genetic relatedness among individuals have been shown to correct for population and/or family structure. Here we extend this methodology to generalized linear mixed models which properly model data under various distributions. In addition we perform association with ancestral haplotypes inferred using a hidden Markov model. RESULTS The method was shown to properly account for stratification under various simulated scenari presenting population and/or family structure. Use of ancestral haplotypes resulted in higher power than SNPs on simulated datasets. Application to real data demonstrates the usefulness of the developed model. Full analysis of a dataset with 4600 individuals and 500 000 SNPs was performed in 2 h 36 min and required 2.28 Gb of RAM. AVAILABILITY The software GLASCOW can be freely downloaded from www.giga.ulg.ac.be/jcms/prod_381171/software. CONTACT francois.guillaume@jouy.inra.fr SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Mathieu Gautier,et al.  Fine Mapping of Quantitative Trait Loci Affecting Female Fertility in Dairy Cattle on BTA03 Using a Dense Single-Nucleotide Polymorphism Map , 2008, Genetics.

[2]  Tom Druet,et al.  Variants modulating the expression of a chromosome domain encompassing PLAG1 influence bovine stature , 2011, Nature Genetics.

[3]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[4]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[5]  Edward S. Buckler,et al.  TASSEL: software for association mapping of complex traits in diverse samples , 2007, Bioinform..

[6]  F. V. van Eeuwijk,et al.  A Mixed-Model Approach to Association Mapping Using Pedigree Information With an Illustration of Resistance to Phytophthora infestans in Potato , 2007, Genetics.

[7]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[8]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[9]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[10]  Sharon R. Browning,et al.  Missing data imputation and haplotype phase inference for genome-wide association studies , 2008, Human Genetics.

[11]  Geert Molenberghs,et al.  The Use of Score Tests for Inference on Variance Components , 2003, Biometrics.

[12]  Robert W. Williams,et al.  Genetic dissection of complex and quantitative traits: from fantasy to reality via a community effort , 2002, Mammalian Genome.

[13]  Yurii S. Aulchenko,et al.  A Genomic Background Based Method for Association Analysis in Related Individuals , 2007, PloS one.

[14]  C. Haley,et al.  Genomewide Rapid Association Using Mixed Model and Regression: A Fast and Simple Method For Genomewide Pedigree-Based Quantitative Trait Loci Association Analysis , 2007, Genetics.

[15]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[16]  Kathryn Roeder,et al.  Evolutionary‐based association analysis using haplotype data , 2003, Genetic epidemiology.

[17]  M. Lathrop,et al.  Serial translocation by means of circular intermediates underlies colour sidedness in cattle , 2012, Nature.

[18]  M. Goddard,et al.  Technical note: prediction of breeding values using marker-derived relationship matrices. , 2008, Journal of animal science.

[19]  Keith Durkin Serial translocation via circular intermediates underlies color-sidedness in cattle. , 2012 .

[20]  Peter Donnelly,et al.  Progress and challenges in genome-wide association studies in humans , 2008, Nature.

[21]  M. Goddard,et al.  Prediction of identity by descent probabilities from marker-haplotypes , 2001, Genetics Selection Evolution.

[22]  P. McCullagh,et al.  Monograph on Statistics and Applied Probability , 1989 .

[23]  Keyan Zhao,et al.  An Arabidopsis Example of Association Mapping in Structured Samples , 2006, PLoS genetics.

[24]  D. Heckerman,et al.  Efficient Control of Population Structure in Model Organism Association Mapping , 2008, Genetics.

[25]  Michel Georges,et al.  Molecular dissection of a quantitative trait locus: a phenylalanine-to-tyrosine substitution in the transmembrane domain of the bovine growth hormone receptor is associated with a major effect on milk yield and composition. , 2002, Genetics.

[26]  D. Schaid,et al.  Score tests for association between traits and haplotypes when linkage phase is ambiguous. , 2002, American journal of human genetics.

[27]  T. H. E. Meuwissen,et al.  Marker based estimates of between and within population kinships for the conservation of genetic diversity , 2001 .

[28]  C. Schrooten,et al.  Genomic breeding value estimation using genetic markers, inferred ancestral haplotypes, and the genomic relationship matrix. , 2011, Journal of dairy science.

[29]  M. Mni,et al.  Microsatellite mapping of the bovine roan locus: A major determinant of White Heifer Disease , 1996, Mammalian Genome.

[30]  Merete Fredholm,et al.  Highly effective SNP-based association mapping and management of recessive defects in livestock , 2008, Nature Genetics.

[31]  Tom Druet,et al.  A Hidden Markov Model Combining Linkage and Linkage Disequilibrium Information for Haplotype Reconstruction and Quantitative Trait Locus Fine Mapping , 2010, Genetics.

[32]  Lachlan James M. Coin,et al.  Disease association tests by inferring ancestral haplotypes using a hidden markov model , 2008, Bioinform..

[33]  Keyan Zhao,et al.  Genome-Wide Association Mapping in Arabidopsis Identifies Previously Known Flowering Time and Pathogen Resistance Genes , 2005, PLoS genetics.

[34]  P M Visscher,et al.  Mapping quantitative trait loci in complex pedigrees: a two-step variance component approach. , 2000, Genetics.

[35]  Tom Druet,et al.  A Splice Site Variant in the Bovine RNF11 Gene Compromises Growth and Regulation of the Inflammatory Response , 2012, PLoS genetics.

[36]  P. Donnelly,et al.  Association mapping in structured populations. , 2000, American journal of human genetics.

[37]  Jung-Ying Tzeng,et al.  Haplotype-based association analysis via variance-components score test. , 2007, American journal of human genetics.

[38]  M. McMullen,et al.  A unified mixed-model method for association mapping that accounts for multiple levels of relatedness , 2006, Nature Genetics.

[39]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[40]  Andrew P Morris,et al.  Linkage disequilibrium mapping via cladistic analysis of single-nucleotide polymorphism haplotypes. , 2004, American journal of human genetics.