stageR: a general stage-wise method for controlling the gene-level false discovery rate in differential expression and differential transcript usage

RNA sequencing studies with complex designs and transcript-resolution analyses involve multiple hypotheses per gene; however, conventional approaches fail to control the false discovery rate (FDR) at gene level. We propose stageR, a two-stage testing paradigm that leverages the increased power of aggregated gene-level tests and allows post hoc assessment for significant genes. This method provides gene-level FDR control and boosts power for testing interaction effects. In transcript-level analysis, it provides a framework that performs powerful gene-level tests while maintaining biological interpretation at transcript-level resolution. The procedure is applicable whenever individual hypotheses can be aggregated, providing a unified framework for complex high-throughput experiments.

[1]  Z. Hall Cancer , 1906, The Hospital.

[2]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[3]  J. Shaffer Modified Sequentially Rejective Multiple Test Procedures , 1986 .

[4]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[5]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[6]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[7]  Jun Zhu,et al.  A two-step strategy for detecting differential gene expression in cDNA microarray data , 2005, Current Genetics.

[8]  Hongmei Jiang,et al.  A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments , 2006, Statistical applications in genetics and molecular biology.

[9]  A. Reiner-Benaim FDR Control by the BH Procedure for Two‐Sided Correlated Tests with Implications to Gene Expression Data Analysis , 2007, Biometrical journal. Biometrische Zeitschrift.

[10]  R. Letón,et al.  Cytochrome P450 3A5 is highly expressed in normal prostate cells but absent in prostate cancer. , 2007, Endocrine-related cancer.

[11]  Y. Benjamini,et al.  Screening for Partial Conjunction Hypotheses , 2008, Biometrics.

[12]  Ali Amin Al Olama,et al.  Identification of seven new prostate cancer susceptibility loci through a genome-wide association study , 2009, Nature Genetics.

[13]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[14]  Gregory R. Grant,et al.  A flexible two-stage procedure for identifying gene sets that are differentially expressed , 2009, Bioinform..

[15]  S. Luo,et al.  mRNA-seq with agnostic splice site discovery for nervous system transcriptomics tested in chronic pain. , 2010, Genome research.

[16]  C. Elsik The pea aphid genome sequence brings theories of insect defense into question , 2010, Genome Biology.

[17]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[18]  R. Gentleman,et al.  Independent filtering increases detection power for high-throughput experiments , 2010, Proceedings of the National Academy of Sciences.

[19]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[20]  E. Letouzé,et al.  Analysis of the copy number profiles of several tumor samples from the same patient reveals the successive steps in tumorigenesis , 2010, Genome Biology.

[21]  R. Spielman,et al.  Polymorphic Cis- and Trans-Regulation of Human Gene Expression , 2010, PLoS biology.

[22]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[23]  Fredrik Wiklund,et al.  Inherited genetic markers discovered to date are able to identify a significant number of men at considerably elevated risk for prostate cancer , 2011, The Prostate.

[24]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[25]  Alyssa C. Frazee,et al.  ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count datasets , 2011, BMC Bioinformatics.

[26]  Toru Yamashita,et al.  Alternative α-synuclein transcript usage as a convergent mechanism in Parkinson's disease pathology , 2012, Nature Communications.

[27]  Charlotte Soneson,et al.  A comparison of methods for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[28]  W. Huber,et al.  Detecting differential usage of exons from RNA-seq data , 2012, Genome research.

[29]  J. Harrow,et al.  Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene , 2013, Genome Biology.

[30]  K. Yi,et al.  AB227. RNA-seq analysis of prostate cancer in the Chinese population identifies recurrent gene fusions, cancer-associated long noncoding RNAs and aberrant alternative splicings , 2014, Cell Research.

[31]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[32]  W. Isaacs,et al.  AR-V7 and resistance to enzalutamide and abiraterone in prostate cancer. , 2014, The New England journal of medicine.

[33]  Mark D. Robinson,et al.  Robustly detecting differential expression in RNA sequencing data using observation weights , 2013, Nucleic acids research.

[34]  Yoav Benjamini,et al.  Selective inference on multiple families of hypotheses , 2014 .

[35]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[36]  J. Goeman,et al.  A multiple testing method for hypotheses structured in a directed acyclic graph , 2015, Biometrical journal. Biometrische Zeitschrift.

[37]  Thierry Arnould,et al.  Lipin-1 regulates cancer cell phenotype and is a potential target to potentiate rapamycin treatment , 2015, Oncotarget.

[38]  M. Robinson,et al.  Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences , 2015, F1000Research.

[39]  Mark D. Robinson,et al.  Isoform prefiltering improves performance of count-based methods for analysis of differential transcript usage , 2016, Genome Biology.

[40]  Christine B. Peterson,et al.  Many Phenotypes Without Many False Discoveries: Error Controlling Strategies for Multitrait Association Studies , 2015, Genetic epidemiology.

[41]  Jelle J. Goeman,et al.  Multiple Testing of Gene Sets from Gene Ontology: Possibilities and Pitfalls , 2016, Briefings Bioinform..

[42]  Lior Pachter,et al.  The Lair: a resource for exploratory analysis of published RNA-Seq data , 2016, bioRxiv.

[43]  L. Clement,et al.  A sex-inducing pheromone triggers cell cycle arrest and mate attraction in the diatom Seminavis robusta , 2016, Scientific Reports.

[44]  R. Irizarry,et al.  Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation , 2015, Nature Biotechnology.

[45]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[46]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[47]  Jeffrey T Leek,et al.  Reproducible RNA-seq analysis using recount2 , 2017, Nature Biotechnology.

[48]  Rafael A. Irizarry,et al.  Flexible expressed region analysis for RNA-seq with derfinder , 2015, bioRxiv.

[49]  Rob Patro,et al.  Salmon provides fast and bias-aware quantification of transcript expression , 2017, Nature Methods.