Analysis of Next Generation Sequencing Data Using Integrated Nested Laplace Approximation (INLA)

Integrated Nested Laplace Approximation (INLA), implemented in the R-package r-inla, is a very versatile methodology for the Bayesian analysis of next generation sequencing count data: it can account for zero-inflation, random effects and correlation across genomic features. We demonstrate its use and provide some insights on its approximations of marginal posteriors. In high-dimension settings like these, INLA is in particular attractive in combination with empirical Bayes. We show how to apply this by estimating priors from the output of INLA. We extend this methodology to estimation of joint priors for a limited number of parameters, which effectuates multivariate shrinkage. Joint priors are useful for appropriate inference when two or more parameters are likely to be strongly correlated. Two examples serve as illustrations: (1) joint inference for differential zero-inflation and means between two groups; (2) correlated group effects on mRNA expression. For both simulated and real data we show that multivariate shrinkage may lead to improved marker selection. We end with a discussion on the use of this INLA-based method within the spectrum of other available methods.

[1]  A. Riebler,et al.  Bayesian bivariate meta‐analysis of diagnostic test studies using integrated nested Laplace approximations , 2010, Statistics in medicine.

[2]  S. Ranade,et al.  Stem cell transcriptome profiling via massive-scale mRNA sequencing , 2008, Nature Methods.

[3]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[4]  L. Tierney,et al.  Accurate Approximations for Posterior Moments and Marginal Densities , 1986 .

[5]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[6]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[7]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[8]  D. Cocchi,et al.  Multiple testing on standardized mortality ratios: a Bayesian hierarchical model for FDR estimation. , 2011, Biostatistics.

[9]  A. W. van der Vaart,et al.  Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. , 2013, Biostatistics.

[10]  Finn Lindgren,et al.  Bayesian computing with INLA: New features , 2012, Comput. Stat. Data Anal..

[11]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[12]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[13]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[14]  James G. Scott,et al.  An exploration of aspects of Bayesian multiple testing , 2006 .

[15]  Qingshu Lu,et al.  Testing Homogeneity of Two Zero‐inflated Poisson Populations , 2009, Biometrical journal. Biometrische Zeitschrift.

[16]  Leonhard Held,et al.  Using integrated nested Laplace approximations for the evaluation of veterinary surveillance data from Switzerland: a case‐study , 2011 .