Two-Stage Bayesian Approach for GWAS With Known Genealogy

ABSTRACT Genome-wide association studies (GWAS) aim to assess relationships between single nucleotide polymorphisms (SNPs) and diseases. They are one of the most popular problems in genetics, and have some peculiarities given the large number of SNPs compared to the number of subjects in the study. Individuals might not be independent, especially in animal breeding studies or genetic diseases in isolated populations with highly inbred individuals. We propose a family-based GWAS model in a two-stage approach comprising a dimension reduction and a subsequent model selection. The first stage, in which the genetic relatedness between the subjects is taken into account, selects the promising SNPs. The second stage uses Bayes factors for comparison among all candidate models and a random search strategy for exploring the space of all the regression models in a fully Bayesian approach. A simulation study shows that our approach is superior to Bayesian lasso for model selection in this setting. We also illustrate its performance in a study on Beta-thalassemia disorder in an isolated population from Sardinia. Supplementary Material describing the implementation of the method proposed in this article is available online.

[1]  Xiang Zhou,et al.  Polygenic Modeling with Bayesian Sparse Linear Mixed Models , 2012, PLoS genetics.

[2]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[3]  Stefano Cabras,et al.  A strategy analysis for genetic association studies with known inbreeding , 2011, BMC Genetics.

[4]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[5]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[6]  Hadley Wickham,et al.  The Split-Apply-Combine Strategy for Data Analysis , 2011 .

[7]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[8]  Daryl Pregibon,et al.  A statistical perspective on KDD , 1995, KDD 1995.

[9]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[10]  James O. Berger,et al.  Objective Bayesian Methods for Model Selection: Introduction and Comparison , 2001 .

[11]  Qing Li,et al.  The Bayesian elastic net , 2010 .

[12]  A. Cao,et al.  Thalassaemia and Glucose-6-Phosphate Dehydrogenase Screening in 13- to 14-Year-Old Students of the Sardinian Population: Preliminary Findings , 2008, Public Health Genomics.

[13]  Raffaella Origa,et al.  BETA THALASSEMIA , 2018, The Professional Medical Journal.

[14]  Y. Kan,et al.  Molecular characterization of beta-thalassemia in the Sardinian population. , 1992, American journal of human genetics.

[15]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[16]  P. Visscher,et al.  Family-based genome-wide association studies. , 2009, Pharmacogenomics.

[17]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[18]  D. Nychka Spatial‐Process Estimates as Smoothers , 2012 .

[19]  F. O’Sullivan A Statistical Perspective on Ill-posed Inverse Problems , 1986 .

[20]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[21]  D. Balding A tutorial on statistical methods for population association studies , 2006, Nature Reviews Genetics.

[22]  J. Ott,et al.  Family-based designs for genome-wide association studies , 2011, Nature Reviews Genetics.

[23]  A. Cao,et al.  Heterozygous beta-thalassemia: relationship between the hematological phenotype and the type of beta-thalassemia mutation. , 1992, American journal of hematology.

[24]  M. J. Bayarri,et al.  Criteria for Bayesian model choice with application to variable selection , 2012, 1209.5240.

[25]  A. V. D. Vaart,et al.  BAYESIAN LINEAR REGRESSION WITH SPARSE PRIORS , 2014, 1403.0735.

[26]  G. García-Donato,et al.  On Sampling Strategies in Bayesian Variable Selection Problems With Large Model Spaces , 2013 .

[27]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[28]  Andrew D. Martin,et al.  MCMCpack: Markov chain Monte Carlo in R , 2011 .

[29]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[30]  David B. Dunson,et al.  A hybrid bayesian approach for genome-wide association studies on related individuals , 2015, Bioinform..

[31]  S. Martino Approximate Bayesian Inference for Latent Gaussian Models , 2007 .

[32]  Anthony N. Pettitt,et al.  Comment on the paper: ‘Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations’ by Rue, H. Martino, S. and Chopin, N. , 2009 .

[33]  Y. Kan,et al.  beta zero thalassemia in Sardinia is caused by a nonsense mutation. , 1981, The Journal of clinical investigation.

[34]  Gonçalo R. Abecasis,et al.  Genome-wide association study shows BCL11A associated with persistent fetal hemoglobin and amelioration of the phenotype of β-thalassemia , 2008, Proceedings of the National Academy of Sciences.

[35]  Gordon K. Smyth,et al.  statmod: Probability Calculations for the Inverse Gaussian Distribution , 2016, R J..

[36]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[37]  Christoph Lange,et al.  Family-based Association Analyses of Imputed Genotypes Reveal Genome-Wide Significant Association of Alzheimer’s disease with OSBPL6, PTPRG and PDCL3 , 2015, Molecular Psychiatry.

[38]  E. George,et al.  The Spike-and-Slab LASSO , 2018 .

[39]  Alessia Dorigoni,et al.  Inverse Gaussian Distribution , 2015 .

[40]  M. Wagner Rare-variant genome-wide association studies: a new frontier in genetic analysis of complex traits. , 2013, Pharmacogenomics.

[41]  A. Cao,et al.  Heterozygous β‐thalassemia: Relationship between the hematological phenotype and the type of β‐thalassemia mutation , 1992 .

[42]  Minsuk Shin,et al.  Scalable Bayesian Variable Selection Using Nonlocal Prior Densities in Ultrahigh-dimensional Settings. , 2015, Statistica Sinica.

[43]  G. Abecasis,et al.  Merlin—rapid analysis of dense genetic maps using sparse gene flow trees , 2002, Nature Genetics.

[44]  F. Girosi,et al.  From regularization to radial, tensor and additive splines , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[45]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[46]  J. Berger,et al.  Optimal predictive model selection , 2004, math/0406464.

[47]  James G. Scott,et al.  Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem , 2010, 1011.2333.

[48]  T. Fearn Ridge Regression , 2013 .

[49]  Arnaud Doucet,et al.  Bayesian Sparsity-Path-Analysis of Genetic Association Signal using Generalized t Priors , 2011, Statistical applications in genetics and molecular biology.

[50]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .