Significance Thresholds for Rare Variant Signals

With the advent of large-scale DNA sequencing studies, it is worth considering how to estimate the required significance threshold for tests of association between the resulting genetic variation and a phenotype of interest. Due to the rarity of most of the identified variants, standard analytic practice now includes, in addition to single-variant tests, a new set of statistical tests that consider simultaneously all genetic variability in a small chosen region of the genome. However, the question of how to set appropriate genome-wide significance thresholds for these region-based tests has received little consideration. To control the family-wise error rate, estimates of the effective number of independent tests, me, are required. Although for single-variant tests, me depends primarily on the linkage disequilibrium, for region-based tests, the choice of regions, of weights, and of test statistics will also influence me. Therefore, me will need to be estimated for each analytic plan. In this chapter, we review a recently proposed method for using the patterns of correlation between test statistics to estimate the required significance thresholds. In this approach, extrapolation from small sections of the genome to the whole genome can provide computationally feasible estimators for genome-wide significance thresholds. We also discuss other factors that may need consideration, such as exome sequencing, the use of false discovery rates for controlling type 1 errors, and region definitions that are not based on physical proximity.

[1]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[2]  Radu V. Craiu,et al.  Stratified false discovery control for large‐scale hypothesis testing with application to genome‐wide association studies , 2006, Genetic epidemiology.

[3]  M. Stephens,et al.  Bayesian statistical methods for genetic association studies , 2009, Nature Reviews Genetics.

[4]  Francesco C Stingo,et al.  INCORPORATING BIOLOGICAL INFORMATION INTO LINEAR MODELS: A BAYESIAN APPROACH TO THE SELECTION OF PATHWAYS AND GENES. , 2011, The annals of applied statistics.

[5]  Qingzhong Liu,et al.  A New Approach to Account for the Correlations among Single Nucleotide Polymorphisms in Genome-Wide Association Studies , 2011, Human Heredity.

[6]  F. Dudbridge,et al.  Estimation of significance thresholds for genomewide association scans , 2008, Genetic epidemiology.

[7]  Christoph Lange,et al.  ‘Location, Location, Location’: a spatial approach for rare variant analysis and an application to a study on non-syndromic cleft lip with or without cleft palate , 2012, Bioinform..

[8]  Antonio Ciampi,et al.  Multiple Regression Methods Show Great Potential for Rare Variant Association Tests , 2012, PloS one.

[9]  J. Li,et al.  Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix , 2005, Heredity.

[10]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[11]  Sharon R. Browning,et al.  Detecting Rare Variant Associations by Identity-by-Descent Mapping in Case-Control Studies , 2012, Genetics.

[12]  E. Zeggini,et al.  Estimating Genome-Wide Significance for Whole-Genome Sequencing Studies , 2014, Genetic epidemiology.

[13]  H. Boezen,et al.  Genome-wide association studies: what do they teach us about asthma and chronic obstructive pulmonary disease? , 2009, Proceedings of the American Thoracic Society.

[14]  J. Pérez-Ortín,et al.  Cytoplasmic 5′-3′ exonuclease Xrn1p is also a genome-wide transcription factor in yeast , 2013, Front. Genet..

[15]  Alexander R. Griffing,et al.  Direct measure of the de novo mutation rate in autism and schizophrenia cohorts. , 2010, American journal of human genetics.

[16]  Nengjun Yi,et al.  Bayesian analysis of rare variants in genetic association studies , 2011, Genetic epidemiology.

[17]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[18]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[19]  Z. Šidák Rectangular Confidence Regions for the Means of Multivariate Normal Distributions , 1967 .

[20]  Xihong Lin,et al.  The effect of correlation in false discovery rate estimation. , 2011, Biometrika.

[21]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[22]  Larry Wasserman,et al.  Using linkage genome scans to improve power of association in genome scans. , 2006, American journal of human genetics.

[23]  Yun Li,et al.  Testing Genetic Association With Rare Variants in Admixed Populations , 2013, Genetic epidemiology.

[24]  G. Uhlenbeck,et al.  On the Theory of the Brownian Motion , 1930 .

[25]  C. Greenwood,et al.  Exploring the potential benefits of stratified false discovery rates for region-based testing of association with rare genetic variation , 2014, Front. Genet..

[26]  J. Weissenbach,et al.  Linkage disequilibrium analysis in young populations: pseudo-vitamin D-deficiency rickets and the founder effect in French Canadians. , 1996, American journal of human genetics.

[27]  Stacey S. Cherny,et al.  Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets , 2011, Human Genetics.

[28]  B. Efron Correlation and Large-Scale Simultaneous Significance Testing , 2007 .

[29]  G. Satten,et al.  A novel haplotype‐sharing approach for genome‐wide case‐control association studies implicates the calpastatin gene in Parkinson's disease , 2009, Genetic epidemiology.

[30]  Pardis C Sabeti,et al.  Linkage disequilibrium in the human genome , 2001, Nature.

[31]  Wei Pan,et al.  Adjustment for Population Stratification via Principal Components in Association Analysis of Rare Variants , 2013, Genetic epidemiology.

[32]  J. Cheverud,et al.  A simple correction for multiple comparisons in interval mapping genome scans , 2001, Heredity.

[33]  E. Lander,et al.  Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. , 1989, Genetics.

[34]  Hua Zhou,et al.  Association screening of common and rare genetic variants by penalized regression , 2010, Bioinform..

[35]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[36]  Brooke L. Fridley,et al.  Localization of Association Signal from Risk and Protective Variants in Sequencing Studies , 2012, Front. Gene..

[37]  P. Sham,et al.  Application of genome-wide SNP data for uncovering pairwise relationships and quantitative trait loci , 2009, Genetica.

[38]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Optimal selection of markers for validation or replication from genome‐wide association studies , 2007, Genetic epidemiology.

[40]  G. Rouleau,et al.  Schizophrenia Genetics: Putting All the Pieces Together , 2012, Current Neurology and Neuroscience Reports.

[41]  G. Abecasis,et al.  Exome sequencing and complex disease: practical aspects of rare variant association studies , 2012, Human molecular genetics.

[42]  B. Browning,et al.  A fast, powerful method for detecting identity by descent. , 2011, American journal of human genetics.

[43]  M. Daly,et al.  Searching for missing heritability: Designing rare variant association studies , 2014, Proceedings of the National Academy of Sciences.

[44]  Joseph E. Cavanaugh,et al.  A Bayesian Approach to the Multiple Comparisons Problem , 2006, Journal of Data Science.

[45]  T. Spector,et al.  Genes Contributing to Pain Sensitivity in the Normal Population: An Exome Sequencing Study , 2012, PLoS genetics.

[46]  Zheng-Zheng Tang,et al.  A general framework for detecting disease associations with rare variants in sequencing studies. , 2011 .

[47]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[48]  V. Moskvina,et al.  On multiple‐testing correction in genome‐wide association studies , 2008, Genetic epidemiology.

[49]  Eden R Martin,et al.  A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms , 2008, Genetic epidemiology.

[50]  Y. Benjamini,et al.  More powerful procedures for multiple significance testing. , 1990, Statistics in medicine.

[51]  M. Province,et al.  Avoiding the high Bonferroni penalty in genome‐wide association studies , 2009, Genetic epidemiology.