Updating RNA-Seq analyses after re-annotation

UNLABELLED The estimation of isoform abundances from RNA-Seq data requires a time-intensive step of mapping reads to either an assembled or previously annotated transcriptome, followed by an optimization procedure for deconvolution of multi-mapping reads. These procedures are essential for downstream analysis such as differential expression. In cases where it is desirable to adjust the underlying annotation, for example, on the discovery of novel isoforms or errors in existing annotations, current pipelines must be rerun from scratch. This makes it difficult to update abundance estimates after re-annotation, or to explore the effect of changes in the transcriptome on analyses. We present a novel efficient algorithm for updating abundance estimates from RNA-Seq experiments on re-annotation that does not require re-analysis of the entire dataset. Our approach is based on a fast partitioning algorithm for identifying transcripts whose abundances may depend on the added or deleted isoforms, and on a fast follow-up approach to re-estimating abundances for all transcripts. We demonstrate the effectiveness of our methods by showing how to synchronize RNA-Seq abundance estimates with the daily RefSeq incremental updates. Thus, we provide a practical approach to maintaining relevant databases of RNA-Seq derived abundance estimates even as annotations are being constantly revised. AVAILABILITY AND IMPLEMENTATION Our methods are implemented in software called ReXpress and are freely available, together with source code, at http://bio.math.berkeley.edu/ReXpress/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Peter D Karp,et al.  The past, present and future of genome-wide re-annotation , 2002, Genome Biology.

[2]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[3]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[4]  Raymond K. Auerbach,et al.  The real cost of sequencing: higher than you think! , 2011, Genome Biology.

[5]  Gunnar Rätsch,et al.  Oqtans: a Galaxy-integrated workflow for quantitative transcriptome analysis from NGS Data , 2011, BMC Bioinformatics.

[6]  Tao Jiang,et al.  IsoLasso: A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly - (Extended Abstract) , 2011, RECOMB.

[7]  B. Graveley The developmental transcriptome of Drosophila melanogaster , 2010, Nature.

[8]  L. Pachter Models for transcript quantification from RNA-Seq , 2011, 1104.3889.

[9]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[10]  Tao Jiang,et al.  Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads , 2012, Bioinform..

[11]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy , 2011, Nucleic Acids Res..

[12]  Inanç Birol,et al.  Dissect: detection and characterization of novel structural alterations in transcribed sequences , 2012, Bioinform..

[13]  Krishna R. Kalari,et al.  Detection of redundant fusion transcripts as biomarkers or disease-specific therapeutic targets in breast cancer. , 2012, Cancer research.

[14]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[15]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[16]  Orion J. Buske,et al.  iReckon: Simultaneous isoform discovery and abundance estimation from RNA-seq data , 2013, Genome research.

[17]  L. Pachter,et al.  Streaming fragment assignment for real-time analysis of sequencing experiments , 2012, Nature Methods.

[18]  David G Hendrickson,et al.  Differential analysis of gene regulation at transcript resolution with RNA-seq , 2012, Nature Biotechnology.

[19]  A. Brazma,et al.  Reuse of public genome-wide gene expression data , 2012, Nature Reviews Genetics.