Finding disease variants in Mendelian disorders by using sequence data: methods and applications.

Many sequencing studies are now underway to identify the genetic causes for both Mendelian and complex traits. Via exome-sequencing, genes harboring variants implicated in several Mendelian traits have already been identified. The underlying methodology in these studies is a multistep algorithm based on filtering variants identified in a small number of affected individuals and depends on whether they are novel (not yet seen in public resources such as dbSNP), shared among affected individuals, and other external functional information on the variants. Although intuitive, these filter-based methods are nonoptimal and do not provide any measure of statistical uncertainty. We describe here a formal statistical approach that has several distinct advantages: (1) it provides fast computation of approximate p values for individual genes, (2) it adjusts for the background variation in each gene, (3) it allows for incorporation of functional or linkage-based information, and (4) it accommodates designs based on both affected relative pairs and unrelated affected individuals. We show via simulations that the proposed approach can be used in conjunction with the existing filter-based methods to achieve a substantially better ranking of a gene relevant for disease when compared to currently used filter-based approaches, this is especially so in the presence of disease locus heterogeneity. We revisit recent studies on three Mendelian diseases and show that the proposed approach results in the implicated gene being ranked first in all studies, and approximate p values of 10(-6) for the Miller Syndrome gene, 1.0 × 10(-4) for the Freeman-Sheldon Syndrome gene, and 3.5 × 10(-5) for the Kabuki Syndrome gene.

[1]  S. Gabriel,et al.  Calibrating a coalescent simulation of human genome sequence variation. , 2005, Genome research.

[2]  Larry Wasserman,et al.  Using linkage genome scans to improve power of association in genome scans. , 2006, American journal of human genetics.

[3]  B. Efron,et al.  Estimating the number of unseen species: How many words did Shakespeare know? Biometrika 63 , 1976 .

[4]  J. Pritchard,et al.  The allelic architecture of human disease genes: common disease-common variant...or not? , 2002, Human molecular genetics.

[5]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[6]  Wei Pan,et al.  A Data-Adaptive Sum Test for Disease Association with Multiple Common or Rare Variants , 2010, Human Heredity.

[7]  Emily H Turner,et al.  Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes , 2009, Nature.

[8]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[9]  P. Shannon,et al.  Exome sequencing identifies the cause of a Mendelian disorder , 2009, Nature Genetics.

[10]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[11]  Lee-Jen Wei,et al.  Pooled Association Tests for Rare Variants in Exon-Resequencing Studies , 2010 .

[12]  Kathryn Roeder,et al.  Testing for an Unusual Distribution of Rare Variants , 2011, PLoS genetics.

[13]  Relative-to-relative transition probabilities for two linked genes. , 1974, Theoretical population biology.

[14]  Iuliana Ionita-Laza,et al.  Genomewide weighted hypothesis testing in family-based association studies, with an application to a 100K scan. , 2007, American journal of human genetics.

[15]  Iuliana Ionita-Laza,et al.  A New Testing Strategy to Identify Rare Variants with Either Risk or Protective Effect on Disease , 2011, PLoS genetics.

[16]  Aravinda Chakravarti,et al.  Genomic contributions to Mendelian disease. , 2011, Genome research.

[17]  Emily H Turner,et al.  Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome , 2010, Nature Genetics.

[18]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[19]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[20]  Gaurav Bhatia,et al.  A Covering Method for Detecting Genetic Associations between Rare Variants and Common Phenotypes , 2010, PLoS Comput. Biol..

[21]  I. Ionita-Laza,et al.  Estimating the number of unseen variants in the human genome , 2009, Proceedings of the National Academy of Sciences.

[22]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[23]  Xin Xu,et al.  A new multimarker test for family‐based association studies , 2007, Genetic epidemiology.

[24]  Suzanne M. Leal,et al.  A Novel Adaptive Method for the Analysis of Next-Generation Sequencing Data to Detect Complex Trait Associations with Rare Variants Due to Gene Main Effects and Interactions , 2010, PLoS genetics.

[25]  I. Ionita-Laza,et al.  Study Designs for Identification of Rare Disease Variants in Complex Diseases: The Utility of Family-Based Designs , 2011, Genetics.

[26]  E. Feuer,et al.  Confidence intervals for directly standardized rates: a method based on the gamma distribution. , 1997, Statistics in medicine.

[27]  Paul J. Rathouz,et al.  An Evolutionary Framework for Association Testing in Resequencing Studies , 2010, PLoS genetics.