Simultaneous Detection of Signal Regions With Applications in Genome-Wide Association Studies

We consider in this paper detection of signal regions associated with disease outcomes in Genome-Wide Association Studies (GWAS). Gene- or region-based methods have become increasingly popular in GWAS as a complementary approach to traditional individual variant analysis. However, these methods test for the association between an outcome and the genetic variants in a pre-specified region, e.g., a gene. In view of massive intergenic regions in GWAS and substantial interests in identifying signal regions for subsequent fine mapping, we propose a computationally efficient quadratic scan (Q-SCAN) statistic based method to detect the existence and the locations of signal regions by scanning the genome continuously. The proposed method accounts for the correlation (linkage disequilibrium) among genetic variants, and allows for signal regions to have both signal and neutral variants, and signal variants whose effects can be in different directions. We study the asymptotic properties of the proposed Q-SCAN statistics. We derive an asymptotic threshold that controls for the family-wise error rate, and show that under regularity conditions the proposed method consistently selects the true signal regions. We perform simulation studies to evaluate the finite sample performance of the proposed method. Our simulation results show that the proposed procedure outperforms the existing methods, especially when signal regions have signal variants whose effects are in different directions, or are contaminated with neutral variants, or have correlated variants. We apply the proposed method to analyze a lung cancer genome-wide association study to identify the genetic regions that are associated with lung cancer risk.

[1]  Joseph Naus,et al.  Approximations for Distributions of Scan Statistics , 1982 .

[2]  A. Dembo,et al.  Large Deviations for Quadratic Functionals of Gaussian Processes , 1993 .

[3]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[4]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[5]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[6]  D. Donoho,et al.  Higher criticism for detecting sparse heterogeneous mixtures , 2004, math/0410072.

[7]  Xiaoming Huo,et al.  Near-optimal detection of geometric objects by fast multiscale methods , 2005, IEEE Transactions on Information Theory.

[8]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[9]  Yan V. Sun,et al.  A scan statistic for identifying chromosomal patterns of SNP association , 2006, Genetic epidemiology.

[10]  David O Siegmund,et al.  A Modified Bayes Information Criterion with Applications to the Analysis of Comparative Genomic Hybridization Data , 2007, Biometrics.

[11]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[12]  Paolo Vineis,et al.  A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25 , 2008, Nature.

[13]  G. Mills,et al.  Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1 , 2008, Nature Genetics.

[14]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[15]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[16]  Eleazar Eskin,et al.  Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers , 2009, PLoS genetics.

[17]  Nancy R. Zhang,et al.  Detecting simultaneous changepoints in multiple sequences. , 2010, Biometrika.

[18]  Hongzhe Li,et al.  Optimal Sparse Segment Identification With Application in Copy Number Variation Analysis , 2010, Journal of the American Statistical Association.

[19]  Deanne M. Taylor,et al.  Powerful SNP-set analysis for case-control genome-wide association studies. , 2010, American journal of human genetics.

[20]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[21]  Peter Donnelly,et al.  HAPGEN2: simulation of multiple disease SNPs , 2011, Bioinform..

[22]  Jacob A. Tennessen,et al.  Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes , 2012, Science.

[23]  David Siegmund,et al.  MODEL SELECTION FOR HIGH-DIMENSIONAL, MULTI-SEQUENCE CHANGE-POINT PROBLEMS , 2012 .

[24]  Hongzhe Li,et al.  Simultaneous Discovery of Rare and Common Segment Variants. , 2013, Biometrika.

[25]  G. Abecasis,et al.  Rare-variant association analysis: study designs and statistical tests. , 2014, American journal of human genetics.

[26]  Manolis Kellis,et al.  Fine mapping of type 1 diabetes susceptibility loci and evidence for colocalization of causal variants with lymphoid gene enhancers , 2015, Nature Genetics.

[27]  J. Barrett,et al.  Strategies for fine-mapping complex traits , 2015, Human molecular genetics.

[28]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.