A hidden Markov random field model for genome-wide association studies.

Genome-wide association studies (GWAS) are increasingly utilized for identifying novel susceptible genetic variants for complex traits, but there is little consensus on analysis methods for such data. Most commonly used methods include single single nucleotide polymorphism (SNP) analysis or haplotype analysis with Bonferroni correction for multiple comparisons. Since the SNPs in typical GWAS are often in linkage disequilibrium (LD), at least locally, Bonferroni correction of multiple comparisons often leads to conservative error control and therefore lower statistical power. In this paper, we propose a hidden Markov random field model (HMRF) for GWAS analysis based on a weighted LD graph built from the prior LD information among the SNPs and an efficient iterative conditional mode algorithm for estimating the model parameters. This model effectively utilizes the LD information in calculating the posterior probability that an SNP is associated with the disease. These posterior probabilities can then be used to define a false discovery controlling procedure in order to select the disease-associated SNPs. Simulation studies demonstrated the potential gain in power over single SNP analysis. The proposed method is especially effective in identifying SNPs with borderline significance at the single-marker level that nonetheless are in high LD with significant SNPs. In addition, by simultaneously considering the SNPs in LD, the proposed method can also help to reduce the number of false identifications of disease-associated SNPs. We demonstrate the application of the proposed HMRF model using data from a case-control GWAS of neuroblastoma and identify 1 new SNP that is potentially associated with neuroblastoma.

[1]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[2]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[3]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[4]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[5]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[6]  Chiara Sabatti,et al.  False discovery rate in linkage and association genome screens for complex disorders. , 2003, Genetics.

[7]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[8]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[9]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[10]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[11]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[12]  C I Amos,et al.  Detecting haplotype effects in genomewide association studies , 2007, Genetic epidemiology.

[13]  P. Fearnhead,et al.  Genome-wide association study of prostate cancer identifies a second risk locus at 8q24 , 2007, Nature Genetics.

[14]  W. Willett,et al.  A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer , 2007, Nature Genetics.

[15]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[16]  K. Taylor,et al.  Genome-Wide Association , 2007, Diabetes.

[17]  G. Abecasis,et al.  A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants , 2007, Science.

[18]  B. Browning,et al.  Efficient multilocus association testing for whole genome association studies using localized haplotype clustering , 2007, Genetic epidemiology.

[19]  Hongzhe Li,et al.  A Markov random field model for network-based analysis of genomic data , 2007, Bioinform..

[20]  Zhi Wei,et al.  A Network-constrained Empirical Bayes Method for Analysis of Genomic Data , 2008 .

[21]  John M. Maris,et al.  Identification of ALK as a major familial neuroblastoma predisposition gene , 2008, Nature.

[22]  Wei Pan,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm612 Systems biology , 2022 .

[23]  Eleazar Eskin,et al.  Increasing Power in Association Studies by Using Linkage Disequilibrium Structure and Molecular Function as Prior Information , 2008, RECOMB.

[24]  Hongzhe Li,et al.  A hidden spatial-temporal Markov random field model for network-based analysis of time course gene expression data , 2008, 0803.3942.

[25]  B. Browning,et al.  Haplotypic analysis of Wellcome Trust Case Control Consortium data , 2008, Human Genetics.

[26]  Nazneen Rahman,et al.  Chromosome 6p22 locus associated with clinically aggressive neuroblastoma. , 2008, The New England journal of medicine.

[27]  Wenguang Sun,et al.  Large‐scale multiple testing under dependence , 2009 .