Gene-level differential analysis at transcript-level resolution

Compared to RNA-sequencing transcript differential analysis, gene-level differential expression analysis is more robust and experimentally actionable. However, the use of gene counts for statistical analysis can mask transcript-level dynamics. We demonstrate that ‘analysis first, aggregation second,’ where the p values derived from transcript analysis are aggregated to obtain gene-level results, increase sensitivity and accuracy. The method we propose can also be applied to transcript compatibility counts obtained from pseudoalignment of reads, which circumvents the need for quantification and is fast, accurate, and model-free. The method generalizes to various levels of biology and we showcase an application to gene ontologies.

[1]  Anushya Muruganujan,et al.  Large-scale gene function analysis with the PANTHER classification system , 2013, Nature Protocols.

[2]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[3]  Yuehua Cui,et al.  A combined p-value approach to infer pathway regulations in eQTL mapping , 2011 .

[4]  P. Moll,et al.  QuantSeq 3[prime] mRNA sequencing for RNA quantification , 2014 .

[5]  A. Hess,et al.  Fisher's combined p-value for detecting differentially expressed genes using Affymetrix expression arrays , 2007, BMC Genomics.

[6]  Lior Pachter,et al.  Differential analysis of RNA-seq incorporating quantification uncertainty , 2016, Nature Methods.

[7]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[8]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[9]  W. Huber,et al.  Inferring differential exon usage in RNA-Seq data with the DEXSeq package , 2015 .

[10]  S. Margulies,et al.  Integrated Stress Response Mediates Epithelial Injury in Mechanical Ventilation , 2017, American journal of respiratory cell and molecular biology.

[11]  Yingyin Yao,et al.  Cloning and characterization of microRNAs from wheat (Triticum aestivum L.) , 2007, Genome Biology.

[12]  M. Robinson,et al.  stageR: a general stage-wise method for controlling the gene-level false discovery rate in differential expression and differential transcript usage , 2017, Genome Biology.

[13]  R. Fisher,et al.  Statistical Methods for Research Workers , 1930, Nature.

[14]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[15]  Self-organizing stem cells , 2013, Nature Methods.

[16]  David G Hendrickson,et al.  Differential analysis of gene regulation at transcript resolution with RNA-seq , 2012, Nature Biotechnology.

[17]  Malgorzata Kisielow,et al.  Isoform-specific knockdown and expression of adaptor protein ShcA using small interfering RNA. , 2002, The Biochemical journal.

[18]  Gregory R. Grant,et al.  Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from RNA-seq data , 2015, Bioinform..

[19]  K. Frahm,et al.  A comparison of the sexually dimorphic dexamethasone transcriptome in mouse cerebral cortical and hypothalamic embryonic neural stem cells , 2017, Molecular and Cellular Endocrinology.

[20]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[21]  R. Charnigo,et al.  Integrating P-values for Genetic and Genomic Data Analysis , 2012 .

[22]  Qingzhong Liu,et al.  A new statistical approach to combining p-values using gamma distribution and its application to genome-wide association study , 2014, BMC Bioinformatics.

[23]  Daniel Marbach,et al.  Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics , 2016, PLoS Comput. Biol..

[24]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[25]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[26]  M. Robinson,et al.  Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences , 2015, F1000Research.

[27]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[28]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[29]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[30]  G. Pertea fqtrim: v0.9.4 release , 2015 .

[31]  W. Huber,et al.  Detecting differential usage of exons from RNA-seq data , 2012, Genome research.

[32]  Z. Šidák Rectangular Confidence Regions for the Means of Multivariate Normal Distributions , 1967 .

[33]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[34]  H. O. Lancaster THE COMBINATION OF PROBABILITIES: AN APPLICATION OF ORTHONORMAL FUNCTIONS , 1961 .

[35]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.