Multiple Genetic Variant Association Testing by Collapsing and Kernel Methods With Pedigree or Population Structured Data

Searching for rare genetic variants associated with complex diseases can be facilitated by enriching for diseased carriers of rare variants by sampling cases from pedigrees enriched for disease, possibly with related or unrelated controls. This strategy, however, complicates analyses because of shared genetic ancestry, as well as linkage disequilibrium among genetic markers. To overcome these problems, we developed broad classes of “burden” statistics and kernel statistics, extending commonly used methods for unrelated case‐control data to allow for known pedigree relationships, for autosomes and the X chromosome. Furthermore, by replacing pedigree‐based genetic correlation matrices with estimates of genetic relationships based on large‐scale genomic data, our methods can be used to account for population‐structured data. By simulations, we show that the type I error rates of our developed methods are near the asymptotic nominal levels, allowing rapid computation of P‐values. Our simulations also show that a linear weighted kernel statistic is generally more powerful than a weighted “burden” statistic. Because the proposed statistics are rapid to compute, they can be readily used for large‐scale screening of the association of genomic sequence data with disease status.

[1]  Wei Pan,et al.  Comparison of statistical tests for disease association with rare variants , 2011, Genetic epidemiology.

[2]  Michael P Epstein,et al.  Ascertainment-adjusted parameter estimates revisited. , 2002, American journal of human genetics.

[3]  Xihong Lin,et al.  Optimal tests for rare variant effects in sequencing association studies. , 2012, Biostatistics.

[4]  David B. Goldstein,et al.  Rare Variants Create Synthetic Genome-Wide Associations , 2010, PLoS biology.

[5]  Xihong Lin,et al.  Detecting Rare Variant Effects Using Extreme Phenotype Sampling in Sequencing Association Studies , 2013, Genetic epidemiology.

[6]  Huan Liu,et al.  A new chi-square approximation to the distribution of non-negative definite quadratic forms in non-central normal variables , 2009, Comput. Stat. Data Anal..

[7]  N Risch,et al.  The relative power of family-based and case-control designs for linkage disequilibrium studies of complex human diseases. II. Individual genotyping. , 1999, Genome research.

[8]  Iuliana Ionita-Laza,et al.  A New Testing Strategy to Identify Rare Variants with Either Risk or Protective Effect on Disease , 2011, PLoS genetics.

[9]  Anthony R. Dallosso,et al.  Multiple rare nonsynonymous variants in the adenomatous polyposis coli gene predispose to colorectal adenomas. , 2008, Cancer research.

[10]  D. Schaid,et al.  Estimation of genotype relative risks from pedigree data by retrospective likelihoods , 2010, Genetic epidemiology.

[11]  Shamil R Sunyaev,et al.  Pooled association tests for rare variants in exon-resequencing studies. , 2010, American journal of human genetics.

[12]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[13]  S. Gabriel,et al.  Calibrating a coalescent simulation of human genome sequence variation. , 2005, Genome research.

[14]  Wei Pan,et al.  Adaptive tests for association analysis of rare variants , 2011, Genetic epidemiology.

[15]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[16]  John S. Witte,et al.  Comprehensive Approach to Analyzing Rare Genetic Variants , 2010, PloS one.

[17]  N. Breslow,et al.  Bias Correction in Generalized Linear Mixed Models with Multiple Components of Dispersion , 1996 .

[18]  Eleftheria Zeggini,et al.  Rare variant association analysis methods for complex traits. , 2010, Annual review of genetics.

[19]  Ching Chun Li First Course in Population Genetics , 1976 .

[20]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[21]  Dan-Yu Lin,et al.  A general framework for detecting disease associations with rare variants in sequencing studies. , 2011, American journal of human genetics.

[22]  Christoph Lange,et al.  Genomic screening and replication using the same data set in family-based association testing , 2005, Nature Genetics.

[23]  Daniel J Schaid,et al.  Genomic Similarity and Kernel Methods II: Methods for Genomic Information , 2010, Human Heredity.

[24]  Iuliana Ionita-Laza,et al.  Family-based association tests for sequence data, and comparisons with population-based association tests , 2013, European Journal of Human Genetics.

[25]  W. Thilly,et al.  A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). , 2007, Mutation research.

[26]  N Risch,et al.  The relative power of family-based and case-control designs for linkage disequilibrium studies of complex human diseases I. DNA pooling. , 1998, Genome research.

[27]  M. Rieder,et al.  Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. , 2012, American journal of human genetics.

[28]  D. Clayton,et al.  Testing for association on the X chromosome , 2008, Biostatistics.

[29]  Mary Sara McPeek,et al.  Enhanced Pedigree Error Detection , 2002, Human Heredity.

[30]  N. Norton,et al.  Coding Sequence Rare Variants Identified in MYBPC3, MYH6, TPM1, TNNC1, and TNNI3 From 312 Patients With Familial or Idiopathic Dilated Cardiomyopathy , 2010, Circulation. Cardiovascular genetics.

[31]  Min A. Jhun,et al.  SNP Set Association Analysis for Familial Data , 2012, Genetic epidemiology.

[32]  Suzanne M. Leal,et al.  A Novel Adaptive Method for the Analysis of Next-Generation Sequencing Data to Detect Complex Trait Associations with Rare Variants Due to Gene Main Effects and Interactions , 2010, PLoS genetics.

[33]  V. Bansal,et al.  Statistical analysis strategies for association studies involving rare variants , 2010, Nature Reviews Genetics.

[34]  Kathryn Roeder,et al.  Testing for an Unusual Distribution of Rare Variants , 2011, PLoS genetics.

[35]  W. Bodmer,et al.  Common and rare variants in multifactorial susceptibility to common diseases , 2008, Nature Genetics.

[36]  Daniel J Schaid,et al.  Genomic Similarity and Kernel Methods I: Advancements by Building on Mathematical and Statistical Foundations , 2010, Human Heredity.

[37]  Daniel Rabinowitz,et al.  A Unified Approach to Adjusting Association Tests for Population Admixture with Arbitrary Pedigree Structure and Arbitrary Missing Marker Information , 2000, Human Heredity.

[38]  Yun Li,et al.  Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes. , 2010, American journal of human genetics.

[39]  Iuliana Ionita-Laza,et al.  Genomewide weighted hypothesis testing in family-based association studies, with an application to a 100K scan. , 2007, American journal of human genetics.

[40]  Jonathan C. Cohen,et al.  Multiple Rare Alleles Contribute to Low Plasma Levels of HDL Cholesterol , 2004, Science.

[41]  Matthew Stephens,et al.  USING LINEAR PREDICTORS TO IMPUTE ALLELE FREQUENCIES FROM SUMMARY OR POOLED GENOTYPE DATA. , 2010, The annals of applied statistics.

[42]  Mary Sara McPeek,et al.  ROADTRIPS: case-control association testing with partially or completely unknown population and pedigree structure. , 2010, American journal of human genetics.

[43]  Xihong Lin,et al.  A powerful and flexible multilocus association test for quantitative traits. , 2008, American journal of human genetics.

[44]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[45]  Amanda B. Hepler,et al.  Genetic relatedness analysis: modern data and new challenges , 2006, Nature Reviews Genetics.

[46]  N. Breslow,et al.  Bias correction in generalised linear mixed models with a single component of dispersion , 1995 .

[47]  J. Meigs,et al.  Sequence Kernel Association Test for Quantitative Traits in Family Samples , 2013, Genetic epidemiology.

[48]  T. Thornton,et al.  XM: Association Testing on the X‐Chromosome in Case‐Control Samples With Related Individuals , 2012, Genetic epidemiology.

[49]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[50]  J. Pritchard Are rare variants responsible for susceptibility to complex diseases? , 2001, American journal of human genetics.

[51]  Eleazar Eskin,et al.  An Optimal Weighted Aggregated Association Test for Identification of Rare Variants Involved in Common Diseases , 2011, Genetics.

[52]  J. Olson Robust estimation of gene frequency and association parameters. , 1994, Biometrics.

[53]  D. Thomas,et al.  Bias and efficiency in family-based gene-characterization studies: conditional, prospective, retrospective, and joint likelihoods. , 2000, American journal of human genetics.