A hidden two-locus disease association pattern in genome-wide association studies

BackgroundRecent association analyses in genome-wide association studies (GWAS) mainly focus on single-locus association tests (marginal tests) and two-locus interaction detections. These analysis methods have provided strong evidence of associations between genetics variances and complex diseases. However, there exists a type of association pattern, which often occurs within local regions in the genome and is unlikely to be detected by either marginal tests or interaction tests. This association pattern involves a group of correlated single-nucleotide polymorphisms (SNPs). The correlation among SNPs can lead to weak marginal effects and the interaction does not play a role in this association pattern. This phenomenon is due to the existence of unfaithfulness: the marginal effects of correlated SNPs do not express their significant joint effects faithfully due to the correlation cancelation.ResultsIn this paper, we develop a computational method to detect this association pattern masked by unfaithfulness. We have applied our method to analyze seven data sets from the Wellcome Trust Case Control Consortium (WTCCC). The analysis for each data set takes about one week to finish the examination of all pairs of SNPs. Based on the empirical result of these real data, we show that this type of association masked by unfaithfulness widely exists in GWAS.ConclusionsThese newly identified associations enrich the discoveries of GWAS, which may provide new insights both in the analysis of tagSNPs and in the experiment design of GWAS. Since these associations may be easily missed by existing analysis tools, we can only connect some of them to publicly available findings from other association studies. As independent data set is limited at this moment, we also have difficulties to replicate these findings. More biological implications need further investigation.AvailabilityThe software is freely available at http://bioinformatics.ust.hk/hidden_pattern_finder.zip.

[1]  Flora Peyvandi,et al.  The thrombospondin-1 N700S polymorphism is associated with early myocardial infarction without altering von Willebrand factor multimer size. , 2006, Blood.

[2]  A. Agresti Categorical data analysis , 1993 .

[3]  K. Frazer,et al.  Common vs. rare allele hypotheses for complex diseases. , 2009, Current opinion in genetics & development.

[4]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[5]  Sagar V Parikh,et al.  TRPM2 variants and bipolar disorder risk: confirmation in a family-based association study. , 2009, Bipolar disorders.

[6]  A. Demaine,et al.  The HLA-E locus is associated with age at onset and susceptibility to type 1 diabetes mellitus. , 2000, Human immunology.

[7]  Effie W Petersdorf,et al.  Long-range multilocus haplotype phasing of the MHC. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[8]  D. Balding A tutorial on statistical methods for population association studies , 2006, Nature Reviews Genetics.

[9]  Andreas Ziegler,et al.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data , 2010, Bioinform..

[10]  Jason H. Moore,et al.  Tuning ReliefF for Genome-Wide Genetic Analysis , 2007, EvoBIO.

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  B. Browning,et al.  A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. , 2009, American journal of human genetics.

[13]  Rajkumar Sasidharan,et al.  GPI80 Distinguishes Transplantable Human Fetal Hematopoietic Stem Cells From Multipotential Progenitors , 2011 .

[14]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[15]  K. Lunetta,et al.  Identifying SNPs predictive of phenotype using random forests , 2005, Genetic epidemiology.

[16]  K. Lunetta,et al.  Screening large-scale association study data: exploiting interactions using random forests , 2004, BMC Genetics.

[17]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[18]  Samuel P. Dickson,et al.  Interpretation of association signals and identification of causal variants from genome-wide association studies. , 2010, American journal of human genetics.

[19]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[20]  M Freedman,et al.  Single Nucleotide Polymorphisms in Multiple Novel Thrombospondin Genes May Be Associated With Familial Premature Myocardial Infarction , 2001, Circulation.

[21]  Simon C. Potter,et al.  Localization of type 1 diabetes susceptibility to the MHC class I genes HLA-B and HLA-A , 2007, Nature.

[22]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[23]  David J. Moliterno,et al.  Evidence for substantial effect modification by gender in a large-scale genetic association study of the metabolic syndrome among coronary heart disease patients , 2003, Human Genetics.

[24]  Judy H. Cho,et al.  A Genome-Wide Association Study Identifies IL23R as an Inflammatory Bowel Disease Gene , 2006, Science.

[25]  L. Wasserman,et al.  HIGH DIMENSIONAL VARIABLE SELECTION. , 2007, Annals of statistics.

[26]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[27]  Li Ma,et al.  Parallel and serial computing tools for testing single-locus and epistatic SNP effects of quantitative traits in genome-wide association studies , 2008, BMC Bioinformatics.

[28]  Andreas Ziegler,et al.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data , 2010, Bioinform..

[29]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[30]  J. Chimka Categorical Data Analysis, Second Edition , 2003 .

[31]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[32]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[33]  C. Hoggart,et al.  Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies , 2008, PLoS genetics.

[34]  C. Carlson,et al.  Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. , 2004, American journal of human genetics.

[35]  D. Crawford,et al.  Regulation of vascular function by RCAN1 (ADAPT78). , 2008, Archives of biochemistry and biophysics.

[36]  Qiang Yang,et al.  Identifying main effects and epistatic interactions from large-scale SNP data via adaptive group Lasso , 2010, BMC Bioinformatics.

[37]  Jun S. Liu,et al.  Bayesian inference of epistatic interactions in case-control studies , 2007, Nature Genetics.

[38]  Larry Wasserman,et al.  All of Statistics , 2004 .

[39]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[40]  Trevor J. Hastie,et al.  Genome-wide association analysis by lasso penalized logistic regression , 2009, Bioinform..

[41]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[42]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[43]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[44]  Christopher I Amos,et al.  Data for Genetic Analysis Workshop 16 Problem 1, association analysis of rheumatoid arthritis data , 2009, BMC proceedings.

[45]  Nicholas W Wood,et al.  Selection and evaluation of tagging SNPs in the neuronal-sodium-channel gene SCN1A: implications for linkage-disequilibrium gene mapping. , 2003, American journal of human genetics.