Bayesian Methods in Fisher’s Statistical Genetics World

Statistical genetics is a scientific discipline that covers any statistical analysis of genetic data. The interplay between statistics and genetics has a long history, dating back to the seminal work by Fisher almost a century ago to confirm the genetic theory of chromosomal inheritance (Piegorsch, 1990). Recent advancements in genotyping (i.e., collecting genetic data) technologies have produced vast amounts of data, offering statisticians great opportunities in methodological development, implementation and application. For example , the high-dimensional genome-wide association studies conducted in the last few years have led to the development of a catalog of novel statistical methods for dissecting the genetic architecture of complex human traits; see, e.g., Thomas et al. (2009) and Begum et al. (2012). The types of genetic data available are quite diverse; they include micro-satellites, single-nucleotide polymorphisms, copy number variations, DNA methylation, and gene expression. The corresponding statistical methodolo-gies are equally diverse, as illustrated by Bull et al. in Chapter 8. For clarity and a more focused discussion, this expository piece is centered around studies of genetic association between single-nucleotide polymorphisms and heritable human traits. In the following, we first provide relevant genetic terminology. We then formulate genetic association studies in terms of regression models in which inferences on the regression coefficients are of interest. Using a published genome-wide association study as an example, we first describe the commonly used frequentist approaches to achieve testing and estimation objectives, and we then discuss alternative Bayesian methods and associated advantages as well as challenges. We conclude with discussions of other recent developments in Bayesian statistical genetics, focusing on contributions made by Canadian statisticians, and comment on future directions.

[1]  F. Dudbridge,et al.  Estimation of significance thresholds for genomewide association scans , 2008, Genetic epidemiology.

[2]  Andriy Derkach,et al.  Pooled Association Tests for Rare Genetic Variants: A Review and Some New Results , 2012 .

[3]  Radu V. Craiu,et al.  Bayesian methods to overcome the winner’s curse in genetic studies , 2009, 0907.2770.

[4]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[5]  W. Piegorsch Fisher's contributions to genetics and heredity, with special emphasis on the Gregor Mendel controversy. , 1990, Biometrics.

[6]  B. Efron Tweedie’s Formula and Selection Bias , 2011, Journal of the American Statistical Association.

[7]  Jon Wakefield,et al.  Bayes factors for genome‐wide association studies: comparison with P‐values , 2009, Genetic epidemiology.

[8]  M. Carine,et al.  Approximate Bayesian Computation Reveals the Crucial Role of Oceanic Islands for the Assembly of Continental Biodiversity. , 2015, Systematic biology.

[9]  Xihong Lin,et al.  Optimal tests for rare variant effects in sequencing association studies. , 2012, Biostatistics.

[10]  L. Excoffier,et al.  Efficient Approximate Bayesian Computation Coupled With Markov Chain Monte Carlo Without Likelihood , 2009, Genetics.

[11]  P. Donnelly,et al.  Inferring coalescence times from DNA sequence data. , 1997, Genetics.

[12]  J. Row,et al.  Approximate Bayesian computation reveals the factors that influence genetic diversity and population structure of foxsnakes , 2011, Journal of evolutionary biology.

[13]  Fei Zou,et al.  Estimating odds ratios in genome scans: an approximate conditional likelihood approach. , 2008, American journal of human genetics.

[14]  W. G. Hill,et al.  Data and Theory Point to Mainly Additive Genetic Variance for Complex Traits , 2008, PLoS genetics.

[15]  Hongyu Zhao,et al.  Empirical Bayes Correction for the Winner's Curse in Genetic Association Studies , 2013, Genetic epidemiology.

[16]  Radu V. Craiu,et al.  Bayesian Computation Via Markov Chain Monte Carlo , 2014 .

[17]  Kesheng Wang,et al.  A Bayesian segmentation approach to ascertain copy number variations at the population level , 2009, Bioinform..

[18]  Raphael Gottardo,et al.  Flexible empirical Bayes models for differential gene expression , 2007, Bioinform..

[19]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[20]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[21]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for GWAS meta-analysis , 2012, Nucleic acids research.

[22]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[23]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[24]  Raphael Gottardo,et al.  A Flexible and Powerful Bayesian Hierarchical Model for ChIP–Chip Experiments , 2008, Biometrics.

[25]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[26]  M. Stephens,et al.  Bayesian statistical methods for genetic association studies , 2009, Nature Reviews Genetics.

[27]  A. Agresti Analysis of Ordinal Categorical Data: Agresti/Analysis , 2010 .

[28]  R. Gottardo,et al.  An Integrated Hierarchical Bayesian Model for Multivariate eQTL Mapping , 2012, Statistical applications in genetics and molecular biology.

[29]  Juan Pablo Lewinger,et al.  Methodological Issues in Multistage Genome-wide Association Studies. , 2009, Statistical science : a review journal of the Institute of Mathematical Statistics.

[30]  Jon Wakefield,et al.  A Bayesian measure of the probability of false discovery in genetic epidemiology studies. , 2007, American journal of human genetics.

[31]  E. Zeggini,et al.  Ranking of genome-wide association scan signals by different measures. , 2009, International journal of epidemiology.

[32]  Lei Sun,et al.  Robust and Powerful Tests for Rare Variants Using Fisher's Method to Combine Evidence of Association From Two or More Complementary Tests , 2013, Genetic epidemiology.

[34]  Nengjun Yi,et al.  Bayesian analysis of rare variants in genetic association studies , 2011, Genetic epidemiology.