ShrinkBayes: a versatile R-package for analysis of count-based sequencing data in complex study designs

BackgroundComplex designs are common in (observational) clinical studies. Sequencing data for such studies are produced more and more often, implying challenges for the analysis, such as excess of zeros, presence of random effects and multi-parameter inference. Moreover, when sample sizes are small, inference is likely to be too liberal when, in a Bayesian setting, applying a non-appropriate prior or to lack power when not carefully borrowing information across features.ResultsWe show on microRNA sequencing data from a clinical cancer study how our software ShrinkBayes tackles the aforementioned challenges. In addition, we illustrate its comparatively good performance on multi-parameter inference for groups using a data-based simulation. Finally, in the small sample size setting, we demonstrate its high power and improved FDR estimation by use of Gaussian mixture priors that include a point mass.ConclusionShrinkBayes is a versatile software package for the analysis of count-based sequencing data, which is particularly useful for studies with small sample sizes or complex designs.

[1]  Eric-Jan Wagenmakers,et al.  An encompassing prior generalization of the Savage-Dickey density ratio , 2010, Comput. Stat. Data Anal..

[2]  D. Cocchi,et al.  Multiple testing on standardized mortality ratios: a Bayesian hierarchical model for FDR estimation. , 2011, Biostatistics.

[3]  A. W. van der Vaart,et al.  Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. , 2013, Biostatistics.

[4]  C. Elsik The pea aphid genome sequence brings theories of insect defense into question , 2010, Genome Biology.

[5]  Bradley Efron,et al.  Large-scale inference , 2010 .

[6]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[7]  Charlotte Soneson,et al.  A comparison of methods for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[8]  Matthew D. Young,et al.  From RNA-seq reads to differential expression results , 2010, Genome Biology.

[9]  Peng Liu,et al.  An Optimal Test with Maximum Average Power While Controlling FDR with Application to RNA‐Seq Data , 2013, Biometrics.

[10]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[11]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[12]  Renée X. de Menezes,et al.  Filtering, FDR and power , 2010, BMC Bioinformatics.

[13]  Mark A van de Wiel,et al.  Analysis of small-sample clinical genomics studies using multi-parameter shrinkage: application to high-throughput RNA interference screening , 2013, BMC Medical Genomics.