A simple Bayesian mixture model with a hybrid procedure for genome-wide association studies

Genome-wide association studies often face the undesirable result of either failing to detect any influential markers at all because of a stringent level for testing error corrections or encountering difficulty in quantifying the importance of markers by their P-values. Advocates of estimation procedures prefer to estimate the proportion of association rather than test significance to avoid overinterpretation. Here, we adopt a Bayesian hierarchical mixture model to estimate directly the proportion of influential markers, and then proceed to a selection procedure based on the Bayes factor (BF). This mixture model is able to accommodate different sources of dependence in the data through only a few parameters. Specifically, we focus on a standardized risk measure of unit variance so that fewer parameters are involved in inference. The expected value of this measure follows a mixture distribution with a mixing probability of association, and it is robust to minor allele frequencies. Furthermore, to select promising markers, we use the magnitude of the BF to represent the strength of evidence in support of the association between markers and disease. We demonstrate this procedure both with simulations and with SNP data from studies on rheumatoid arthritis, coronary artery disease, and Crohn's disease obtained from the Wellcome Trust Case–Control Consortium. This Bayesian procedure outperforms other existing methods in terms of accuracy, power, and computational efficiency. The R code that implements this method is available at http://homepage.ntu.edu.tw/~ckhsiao/Bmix/Bmix.htm.

[1]  F. Kronenberg,et al.  A genome scan for loci influencing anti-atherogenic serum bilirubin levels , 2002, European Journal of Human Genetics.

[2]  John S Witte,et al.  Using hierarchical modeling in genetic association studies with multiple markers: application to a case-control study of bladder cancer. , 2004, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[3]  Stan Pounds,et al.  Estimating the Occurrence of False Positives and False Negatives in Microarray Studies by Approximating and Partitioning the Empirical Distribution of P-values , 2003, Bioinform..

[4]  U. Strömberg Empirical Bayes and semi-Bayes adjustments for a vast number of estimations , 2009, European Journal of Epidemiology.

[5]  Steven J. Schrodi,et al.  A missense single-nucleotide polymorphism in a gene encoding a protein tyrosine phosphatase (PTPN22) is associated with rheumatoid arthritis. , 2004, American journal of human genetics.

[6]  B. Rannala,et al.  The Bayesian revolution in genetics , 2004, Nature Reviews Genetics.

[7]  Joseph F Lucke,et al.  A critique of the false‐positive report probability , 2009, Genetic epidemiology.

[8]  Chuhsing Kate Hsiao,et al.  A two-stage design for multiple testing in large-scale association studies , 2006, Journal of Human Genetics.

[9]  Jon Wakefield,et al.  Bayes factors for genome‐wide association studies: comparison with P‐values , 2009, Genetic epidemiology.

[10]  Jon Wakefield,et al.  A Bayesian measure of the probability of false discovery in genetic epidemiology studies. , 2007, American journal of human genetics.

[11]  James G. Scott,et al.  An exploration of aspects of Bayesian multiple testing , 2006 .

[12]  John D. Storey A direct approach to false discovery rates , 2002 .

[13]  A. Barton,et al.  Investigation of genetic variation across the protein tyrosine phosphatase gene in patients with rheumatoid arthritis in the UK , 2006, Annals of the rheumatic diseases.

[14]  James W Baurley,et al.  Hierarchical Bayes prioritization of marker associations from a genome‐wide association scan for further investigation , 2007, Genetic epidemiology.

[15]  P. Vineis,et al.  Selection of Influential Genetic Markers Among a Large Number of Candidates Based on Effect Estimation Rather than Hypothesis Testing: An Approach for Genome-Wide Association Studies , 2008, Epidemiology.

[16]  Wei Pan,et al.  A mixture model approach to detecting differentially expressed genes with microarray data , 2003, Functional & Integrative Genomics.

[17]  Nathaniel Rothman,et al.  Assessing the Probability That a Positive Report is False: An Approach for Molecular Epidemiology Studies , 2004 .

[18]  L. Wasserman,et al.  False discovery control with p-value weighting , 2006 .

[19]  Geoffrey J. McLachlan,et al.  A mixture model-based approach to the clustering of microarray expression data , 2002, Bioinform..

[20]  Nathaniel Rothman,et al.  Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. , 2004, Journal of the National Cancer Institute.

[21]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[22]  James G. R. Gilbert,et al.  Variation analysis and gene annotation of eight MHC haplotypes: The MHC Haplotype Project , 2008, Immunogenetics.

[23]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .