论文信息 - Terminus enables the discovery of data-driven, robust transcript groups from RNA-seq data - 字舞流文

Terminus enables the discovery of data-driven, robust transcript groups from RNA-seq data

Motivation Advances in sequencing technology, inference algorithms and differential testing methodology have enabled transcript-level analysis of RNA-seq data. Yet, the inherent inferential uncertainty in transcriptlevel abundance estimation, even among the most accurate approaches, means that robust transcript-level analysis often remains a challenge. Conversely, gene-level analysis remains a common and robust approach for understanding RNA-seq data, but it coarsens the resulting analysis to the level of genes, even if the data strongly support specific transcript-level effects. Results We introduce a new data-driven approach for grouping together transcripts in an experiment based on their inferential uncertainty. Transcripts that share large numbers of ambiguously-mapping fragments with other transcripts, in complex patterns, often cannot have their abundances confidently estimated. Yet, the total transcriptional output of that group of transcripts will have greatly-reduced inferential uncertainty, thus allowing more robust and confident downstream analysis. Our approach, implemented in the tool terminus, groups together transcripts in a data-driven manner allowing transcript-level analysis where it can be confidently supported, and deriving transcriptional groups where the inferential uncertainty is too high to support a transcript-level result. Availability Terminus is implemented in Rust, and is freely-available and open-source. It can be obtained from https://github.com/COMBINE-lab/Terminus. Contact rob@cs.umd.edu Supplementary information Supplementary data are available at Bioinformatics online.

Rob Patro | Avi Srivastava | Hirak Sarkar | Michael I. Love | Héctor Corrada Bravo | M. Love | Robert Patro | Hirak Sarkar | Avi Srivastava | H. C. Bravo

[1] Antti Honkela,et al. Identifying differentially expressed transcripts from RNA-seq data with biological variation , 2011, Bioinform..

[2] Steven L Salzberg,et al. Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[3] Li Yang,et al. Conservation of an RNA regulatory map between Drosophila and mammals. , 2011, Genome research.

[4] Joseph G Ibrahim,et al. Nonparametric expression analysis using inferential replicate counts , 2019, bioRxiv.

[5] Alyssa C. Frazee,et al. Polyester: Simulating RNA-Seq Datasets With Differential Transcript Expression , 2014, bioRxiv.

[6] Cole Trapnell,et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[7] Charlotte Soneson,et al. Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification , 2018, F1000Research.

[8] Rob Patro,et al. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms , 2013, Nature Biotechnology.

[9] Lior Pachter,et al. Differential analysis of RNA-seq incorporating quantification uncertainty , 2016, Nature Methods.

[10] Pedro G. Ferreira,et al. Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[11] L. Coin,et al. Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads , 2011, Genome Biology.

[12] Rob Patro,et al. Salmon provides fast and bias-aware quantification of transcript expression , 2017, Nature Methods.

[13] Michael Garland,et al. Surface simplification using quadric error metrics , 1997, SIGGRAPH.

[14] Fatemeh Almodaresi,et al. Improved data-driven likelihood factorizations for transcript abundance estimation , 2017, Bioinform..

[15] Ernest Turro,et al. Flexible analysis of RNA-seq data using mixed effects models , 2014, Bioinform..

[16] Robert Patro,et al. Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms , 2013, ArXiv.

[17] Mitsuo Iwadate,et al. TINAGL1 and B3GALNT1 are potential therapy target genes to suppress metastasis in non-small cell lung cancer , 2014, BMC Genomics.

[18] Robert E. Tarjan,et al. Three Partition Refinement Algorithms , 1987, SIAM J. Comput..

[19] Ion I Măndoiu,et al. Bootstrap-based differential gene expression analysis for RNA-Seq data with and without replicates , 2014, BMC Genomics.

[20] Qi Zhou,et al. Alternative Splicing within and between Drosophila Species, Sexes, Tissues, and Developmental Stages , 2016, PLoS genetics.

[21] Gary A. Churchill,et al. Hierarchical analysis of RNA‐seq reads improves the accuracy of allele‐specific expression , 2018, Bioinform..

[22] Faraz Hach,et al. ORMAN: Optimal resolution of ambiguous RNA-Seq multimappings in the presence of novel isoforms , 2014, Bioinform..

[23] Ronald L. Rivest,et al. Introduction to Algorithms , 1990 .

[24] Thomas R. Gingeras,et al. STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[25] Mick Watson,et al. Errors in RNA-Seq quantification affect genes of relevance to human disease , 2015, Genome Biology.

[26] Steven L Salzberg,et al. HISAT: a fast spliced aligner with low memory requirements , 2015, Nature Methods.

[27] R. Irizarry,et al. Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation , 2015, Nature Biotechnology.

[28] Colin N. Dewey,et al. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.