samExploreR: exploring reproducibility and robustness of RNA-seq results based on SAM files

MOTIVATION Data from RNA-seq experiments provide us with many new possibilities to gain insights into biological and disease mechanisms of cellular functioning. However, the reproducibility and robustness of RNA-seq data analysis results is often unclear. This is in part attributed to the two counter acting goals of (i) a cost efficient and (ii) an optimal experimental design leading to a compromise, e.g. in the sequencing depth of experiments. RESULTS We introduce an R package called samExploreR that allows the subsampling (m out of n bootstraping) of short-reads based on SAM files facilitating the investigation of sequencing depth related questions for the experimental design. Overall, this provides a systematic way for exploring the reproducibility and robustness of general RNA-seq studies. We exemplify the usage of samExploreR by studying the influence of the sequencing depth and the annotation on the identification of differentially expressed genes. AVAILABILITY AND IMPLEMENTATION samExploreR is available as an R package from Bioconductor. CONTACT v@bio-complexity.comSupplementary information: Supplementary data are available at Bioinformatics online.

[1]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[2]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[3]  Jie Zhou,et al.  RNA-seq differential expression studies: more sequence or more replication? , 2014, Bioinform..

[4]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[5]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[6]  David G. Robinson,et al.  subSeq: Determining Appropriate Sequencing Depth Through Efficient Read Subsampling , 2014, Bioinform..

[7]  Wei Shi,et al.  featureCounts: an efficient general-purpose read summarization program , 2013 .

[8]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[9]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[10]  Charlotte Soneson,et al.  A comparison of methods for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[11]  A. Barco,et al.  Blocking miRNA Biogenesis in Adult Forebrain Neurons Enhances Seizure Susceptibility, Fear Memory, and Food Intake by Increasing Neuronal Responsiveness. , 2016, Cerebral cortex.

[12]  Nicolas Servant,et al.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis , 2013, Briefings Bioinform..

[13]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[14]  C. Ponting,et al.  Sequencing depth and coverage: key considerations in genomic analyses , 2014, Nature Reviews Genetics.

[15]  C. Mason,et al.  Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data , 2013, Genome Biology.

[16]  Daniel R. Zerbino,et al.  Ensembl 2014 , 2013, Nucleic Acids Res..