Association studies for next-generation sequencing.

Genome-wide association studies (GWAS) have become the primary approach for identifying genes with common variants influencing complex diseases. Despite considerable progress, the common variations identified by GWAS account for only a small fraction of disease heritability and are unlikely to explain the majority of phenotypic variations of common diseases. A potential source of the missing heritability is the contribution of rare variants. Next-generation sequencing technologies will detect millions of novel rare variants, but these technologies have three defining features: identification of a large number of rare variants, a high proportion of sequence errors, and a large proportion of missing data. These features raise challenges for testing the association of rare variants with phenotypes of interest. In this study, we use a genome continuum model and functional principal components as a general principle for developing novel and powerful association analysis methods designed for resequencing data. We use simulations to calculate the type I error rates and the power of nine alternative statistics: two functional principal component analysis (FPCA)-based statistics, the multivariate principal component analysis (MPCA)-based statistic, the weighted sum (WSS), the variable-threshold (VT) method, the generalized T(2), the collapsing method, the CMC method, and individual tests. We also examined the impact of sequence errors on their type I error rates. Finally, we apply the nine statistics to the published resequencing data set from ANGPTL4 in the Dallas Heart Study. We report that FPCA-based statistics have a higher power to detect association of rare variants and a stronger ability to filter sequence errors than the other seven methods.

[1]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[2]  James O. Ramsay,et al.  Functional Data Analysis , 2005 .

[3]  K. Frazer,et al.  Human genetic variation and its contribution to complex traits , 2009, Nature Reviews Genetics.

[4]  Timothy B. Stockwell,et al.  Evaluation of next generation sequencing platforms for population targeted sequencing studies , 2009, Genome Biology.

[5]  Lee-Jen Wei,et al.  Pooled Association Tests for Rare Variants in Exon-Resequencing Studies , 2010 .

[6]  Eric Boerwinkle,et al.  Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL , 2007, Nature Genetics.

[7]  Olivier Harismendy,et al.  Accurate detection and genotyping of SNPs utilizing population sequencing data. , 2010, Genome research.

[8]  J. Todd,et al.  Rare Variants of IFIH1, a Gene Implicated in Antiviral Responses, Protect Against Type 1 Diabetes , 2009, Science.

[9]  Jonathan C. Cohen,et al.  Multiple rare variants in NPC1L1 associated with reduced sterol absorption and plasma low-density lipoprotein levels. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[10]  R. Elston,et al.  Detecting rare variants for complex traits using family and unrelated data , 2010, Genetic epidemiology.

[11]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[12]  P Joyce,et al.  The distribution of rare alleles , 1995, Journal of mathematical biology.

[13]  Michael Lynch,et al.  Estimation of Allele Frequencies From High-Coverage Genome-Sequencing Projects , 2009, Genetics.

[14]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[15]  Jasper Rine,et al.  The prevalence of folate-remedial MTHFR enzyme variants in humans , 2008, Proceedings of the National Academy of Sciences.

[16]  Yun Li,et al.  To identify associations with rare variants, just WHaIT: Weighted haplotype and imputation-based tests. , 2010, American journal of human genetics.

[17]  K. Frazer,et al.  Common vs. rare allele hypotheses for complex diseases. , 2009, Current opinion in genetics & development.

[18]  M. Spitz,et al.  Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms. , 2008, American journal of human genetics.

[19]  Hongyu Zhao,et al.  Rare independent mutations in renal salt handling genes contribute to blood pressure variation , 2008, Nature Genetics.

[20]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[21]  V. Bansal,et al.  Statistical analysis strategies for association studies involving rare variants , 2010, Nature Reviews Genetics.

[22]  Peter Donnelly,et al.  A comparison of bayesian methods for haplotype reconstruction from population genotype data. , 2003, American journal of human genetics.

[23]  David B. Goldstein,et al.  Rare Variants Create Synthetic Genome-Wide Associations , 2010, PLoS biology.

[24]  The probability distribution of the amount of an individual's genome surviving to the following generation. , 1996, Genetics.

[25]  H. Müller,et al.  Functional Data Analysis for Sparse Longitudinal Data , 2005 .

[26]  Hans-Georg Müller,et al.  Functional Data Analysis for Sparse Auction Data , 2008 .

[27]  Mark J. P. Chaisson,et al.  De novo fragment assembly with short mate-paired reads: Does the read length matter? , 2009, Genome research.

[28]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[29]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[30]  Philip L. F. Johnson,et al.  Accounting for bias from sequencing error in population genetic estimates. , 2007, Molecular biology and evolution.

[31]  R. Nielsen,et al.  Population genetic inference from genomic sequence variation. , 2010, Genome research.

[32]  Peter Plaschko,et al.  Stochastic Differential Equations In Science And Engineering , 2006 .

[33]  Xiaofeng Zhu,et al.  Detecting rare variants. , 2012, Methods in molecular biology.