From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline

In recent years, RNA sequencing (RNA-seq) has become a very widely used technology for profiling gene expression. One of the most common aims of RNA-seq profiling is to identify genes or molecular pathways that are differentially expressed (DE) between two or more biological conditions. This article demonstrates a computational workflow for the detection of DE genes and pathways from RNA-seq data by providing a complete analysis of an RNA-seq experiment profiling epithelial cell subsets in the mouse mammary gland. The workflow uses R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, including alignment of read sequences, data exploration, differential expression analysis, visualization and pathway analysis. Read alignment and count quantification is conducted using the Rsubread package and the statistical analyses are performed using the edgeR package. The differential expression analysis uses the quasi-likelihood functionality of edgeR.

[1]  Yidong Chen,et al.  A novel significance score for gene selection and ranking , 2014, Bioinform..

[2]  Gordon K. Smyth,et al.  Testing significance relative to a fold-change threshold is a TREAT , 2009, Bioinform..

[3]  G. Smyth,et al.  ROBUST HYPERPARAMETER ESTIMATION PROTECTS AGAINST HYPERVARIABLE GENES AND IMPROVES POWER TO DETECT DIFFERENTIAL EXPRESSION. , 2016, The annals of applied statistics.

[4]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[5]  Steven P Lund,et al.  Statistical Applications in Genetics and Molecular Biology Detecting Differential Expression in RNA-sequence Data Using Quasi-likelihood with Shrunken Dispersion Estimates , 2012 .

[6]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[7]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[8]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[9]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[10]  Aaron T. L. Lun,et al.  De novo detection of differentially bound regions for ChIP-seq data using peaks and windows: controlling error rates correctly , 2014, Nucleic acids research.

[11]  G. Barton,et al.  How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? , 2015, RNA.

[12]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[13]  Gordon K Smyth,et al.  The use of miRNA microarrays for the analysis of cancer samples with global miRNA decrease , 2013, RNA.

[14]  J. Marioni,et al.  Pooling across cells to normalize single-cell RNA sequencing data with many zero counts , 2016, Genome Biology.

[15]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[16]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[17]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[18]  Elgene Lim,et al.  Open Access Research Article Transcriptome Analyses of Mouse and Human Mammary Cell Subpopulations Reveal Multiple Conserved Genes and Pathways , 2022 .

[19]  M. Robinson,et al.  Small-sample estimation of negative binomial dispersion, with applications to SAGE data. , 2007, Biostatistics.

[20]  W. Shi,et al.  The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote , 2013, Nucleic acids research.

[21]  Wei Shi,et al.  featureCounts: an efficient general-purpose read summarization program , 2013 .

[22]  Mark D. Robinson,et al.  Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..

[23]  Di Wu,et al.  ROAST: rotation gene set tests for complex microarray experiments , 2010, Bioinform..

[24]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Aaron T. L. Lun,et al.  It's DE-licious: A Recipe for Differential Expression Analyses of RNA-seq Experiments Using Quasi-Likelihood Methods in edgeR , 2016, Statistical Genomics.

[26]  Aaron T. L. Lun,et al.  diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data , 2015, BMC Bioinformatics.

[27]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[28]  Nuno A. Fonseca,et al.  RNA-Seq Gene Profiling - A Systematic Empirical Comparison , 2014, bioRxiv.

[29]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[30]  Wolfgang Huber,et al.  RNA-Seq workflow: gene-level exploratory analysis and differential expression , 2015, F1000Research.

[31]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[32]  Aaron T. L. Lun,et al.  EGF-mediated induction of Mcl-1 at the switch to lactation is essential for alveolar cell survival , 2015, Nature Cell Biology.

[33]  Susan R. Wilson,et al.  Error estimates for the analysis of differential expression from RNA-seq count data , 2014, PeerJ.