NASA GeneLab RNA-seq consensus pipeline: standardized processing of short-read RNA-seq data

With the development of transcriptomic technologies, we are able to quantify precise changes in gene expression profiles from astronauts and other organisms exposed to spaceflight. Members of NASA GeneLab and GeneLab-associated analysis working groups (AWGs) have developed a consensus pipeline for analyzing short-read RNA-sequencing data from spaceflight-associated experiments. The pipeline includes quality control, read trimming, mapping, and gene quantification steps, culminating in the detection of differentially expressed genes. This data analysis pipeline and the results of its execution using data submitted to GeneLab are now all publicly available through the GeneLab database. We present here the full details and rationale for the construction of this pipeline in order to promote transparency, reproducibility and reusability of pipeline data, to provide a template for data processing of future spaceflight-relevant datasets, and to encourage cross-analysis of data from other databases with the data available in GeneLab.

[1]  M. Campbell,et al.  PANTHER: a library of protein families and subfamilies indexed by function. , 2003, Genome research.

[2]  Sandrine Dudoit,et al.  GC-Content Normalization for RNA-Seq Data , 2011, BMC Bioinformatics.

[3]  Monther Alhamdoosh,et al.  RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR , 2016, F1000Research.

[4]  David P. Kreil,et al.  Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures , 2014, Nature Communications.

[5]  Oliver Hofmann,et al.  ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level , 2010, Bioinform..

[6]  Måns Magnusson,et al.  MultiQC: summarize analysis results for multiple tools and samples in a single report , 2016, Bioinform..

[7]  Kenneth K. Lopiano,et al.  RNA-seq: technical variability and sampling , 2011, BMC Genomics.

[8]  Charles A Tilford,et al.  Gene set enrichment analysis. , 2009, Methods in molecular biology.

[9]  Dmitri D. Pervouchine,et al.  A benchmark for RNA-seq quantification pipelines , 2016, Genome Biology.

[10]  M. Robinson,et al.  Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences , 2015, F1000Research.

[11]  D. Hincha,et al.  Evaluation of Seven Different RNA-Seq Alignment Tools Based on Experimental Data from the Model Plant Arabidopsis thaliana , 2020, International journal of molecular sciences.

[12]  Eun Ji Kim,et al.  Simulation-based comprehensive benchmarking of RNA-seq aligners , 2016, Nature Methods.

[13]  Jing Wang,et al.  WebGestalt 2019: gene set analysis toolkit with revamped UIs and APIs , 2019, Nucleic Acids Res..

[14]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[15]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[16]  Hugh E. Olsen,et al.  The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community , 2016, Genome Biology.

[17]  M. Robinson,et al.  Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. , 2015, F1000Research.

[18]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[19]  Adrian Alexa,et al.  Gene set enrichment analysis with topGO , 2006 .

[20]  Charles C. Kim,et al.  Trimming of sequence reads alters RNA-Seq gene expression estimates , 2016, BMC Bioinformatics.

[21]  Christopher E Mason,et al.  Advancing the Integration of Biosciences Data Sharing to Further Enable Space Exploration. , 2020, Cell reports.

[22]  Rob Patro,et al.  Salmon provides fast and bias-aware quantification of transcript expression , 2017, Nature Methods.

[23]  I. Nookaew,et al.  Enriching the gene set analysis of genome-wide data by incorporating directionality of gene expression and combining statistical hypotheses and methods , 2013, Nucleic acids research.

[24]  A. Heger,et al.  UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy , 2016, bioRxiv.

[25]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[26]  Daniel C. Berrios,et al.  NASA GeneLab: interfaces for the exploration of space omics data , 2020, Nucleic Acids Res..

[27]  Alexei V. Evsikov,et al.  Aligning the Aligners: Comparison of RNA Sequencing Data Alignment and Gene Expression Quantification Tools for Clinical Breast Cancer Research , 2019, Journal of personalized medicine.

[28]  R. Firth Function , 1955, Yearbook of Anthropology.

[29]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[31]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[32]  Claus O. Wilke,et al.  Limitations of alignment-free tools in total RNA-seq quantification , 2018, BMC Genomics.

[33]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[34]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[35]  Shayoni Ray,et al.  RNAseq Analysis of Rodent Spaceflight Experiments Is Confounded by Sample Collection Techniques , 2020, bioRxiv.

[36]  Roy E. Welsch,et al.  Graphics for data analysis , 1974, SIGGRAPH '74.

[37]  Li Tong,et al.  The impact of RNA-seq aligners on gene expression estimation , 2015, BCB.

[38]  Sorin Draghici,et al.  Identifying significantly impacted pathways: a comprehensive review and assessment , 2019, Genome Biology.

[39]  Juliana Costa-Silva,et al.  RNA-Seq differential expression analysis: An extended review and a software tool , 2017, PloS one.

[40]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[41]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[42]  Xintao Wei,et al.  Perspectives on ENCODE , 2020, Nature.

[43]  Anushya Muruganujan,et al.  PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees , 2012, Nucleic Acids Res..

[44]  M. Salit,et al.  Synthetic Spike-in Standards for Rna-seq Experiments Material Supplemental Open Access License Commons Creative , 2022 .

[45]  Jing Chen,et al.  ToppGene Suite for gene list enrichment analysis and candidate gene prioritization , 2009, Nucleic Acids Res..

[46]  G. Barton,et al.  How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? , 2015, RNA.

[47]  Anne E. Trefethen,et al.  Toward interoperable bioscience data , 2012, Nature Genetics.

[48]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[49]  Benedict Paten,et al.  Erratum to: The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community , 2016, Genome Biology.

[50]  Douglas J. Botkin,et al.  Nanopore DNA Sequencing and Genome Assembly on the International Space Station , 2016, bioRxiv.

[51]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .