Enriching the analysis of genomewide association studies with hierarchical modeling.

Genomewide association studies (GWAs) initially investigate hundreds of thousands of single-nucleotide polymorphisms (SNPs), and the most promising SNPs are further evaluated with additional subjects, for replication or a joint analysis. Deciding which SNPs merit follow-up is one of the most crucial aspects of these studies. We present here an approach for selecting the most-promising SNPs that incorporates into a hierarchical model both conventional results and other existing information about the SNPs. The model is developed for general use, its potential value is shown by application, and tools are provided for undertaking hierarchical modeling. By quantitatively harnessing all available information in GWAs, hierarchical modeling may more clearly distinguish true causal variants from noise.

[1]  S Greenland,et al.  Methods for epidemiologic analyses of multiple exposures: a review and comparative study of maximum-likelihood, preliminary-testing, and empirical-Bayes regression. , 1993, Statistics in medicine.

[2]  C. Morris Parametric Empirical Bayes Inference: Theory and Applications , 1983 .

[3]  S Greenland,et al.  Empirical-Bayes and semi-Bayes approaches to occupational and environmental hazard surveillance. , 1994, Archives of environmental health.

[4]  Larry Wasserman,et al.  Using linkage genome scans to improve power of association in genome scans. , 2006, American journal of human genetics.

[5]  J S Witte,et al.  Genetic analysis with hierarchical models , 1997, Genetic epidemiology.

[6]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[7]  Sander Greenland,et al.  Dissecting Effects of Complex Mixtures: Who’s Afraid of Informative Priors? , 2007, Epidemiology.

[8]  L. Almasy,et al.  Multipoint quantitative-trait linkage analysis in general pedigrees. , 1998, American journal of human genetics.

[9]  D. Clayton,et al.  Empirical Bayes methods for testing associations with large numbers of candidate genes in the presence of environmental risk factors, with applications to HLA associations in IDDM. , 1992, Annals of medicine.

[10]  John S Witte,et al.  Using hierarchical modeling in genetic association studies with multiple markers: application to a case-control study of bladder cancer. , 2004, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[11]  M. Daly,et al.  Evaluating and improving power in whole-genome association studies using fixed marker sets , 2006, Nature Genetics.

[12]  J. Witte,et al.  Hierarchical modeling of linkage disequilibrium: genetic structure and spatial relations. , 2003, American journal of human genetics.

[13]  Hierarchical Modeling of the Relation Between Sequence Variants and a Quantitative Trait: Addressing Multiple Comparison and Population Stratification Issues , 2001, Genetic epidemiology.

[14]  S Greenland,et al.  Simulation study of hierarchical regression. , 1996, Statistics in medicine.

[15]  B. Efron,et al.  Data Analysis Using Stein's Estimator and its Generalizations , 1975 .

[16]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[17]  S Greenland,et al.  Second-stage least squares versus penalized quasi-likelihood for fitting hierarchical models in epidemiologic analyses. , 1997, Statistics in medicine.

[18]  S Greenland,et al.  Hierarchical Regression Analysis Applied to a Study of Multiple Dietary Exposures and Breast Cancer , 1994, Epidemiology.

[19]  Nathaniel Rothman,et al.  Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. , 2004, Journal of the National Cancer Institute.

[20]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[21]  A. Wald Tests of statistical hypotheses concerning several parameters when the number of observations is large , 1943 .

[22]  C. Molony,et al.  Genetic analysis of genome-wide variation in human gene expression , 2004, Nature.

[23]  Joshua T. Burdick,et al.  Mapping determinants of human gene expression by regional and genome-wide association , 2005, Nature.

[24]  C. Begg,et al.  Two‐Stage Designs for Gene–Disease Association Studies , 2002, Biometrics.

[25]  Radu V. Craiu,et al.  Stratified false discovery control for large‐scale hypothesis testing with application to genome‐wide association studies , 2006, Genetic epidemiology.

[26]  John S Witte,et al.  Hierarchical modeling in association studies of multiple phenotypes , 2005, BMC Genetics.

[27]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[28]  Teri E. Klein,et al.  The functional importance of disease-associated mutation , 2002, BMC Bioinformatics.

[29]  S Greenland,et al.  A semi-Bayes approach to the analysis of correlated multiple associations, with an application to an occupational cancer-mortality study. , 1992, Statistics in medicine.