Efficient Heuristic for Decomposing a Flow with Minimum Number of Paths

Motivated by transcript assembly and multiple genome assembly problems, in this paper, we study the following minimum path flow decomposition problem: given a directed acyclic graph G = (V,E) with source s and sink t and a flow f, compute a set of s-t paths P and assign weight w(p) for p ∈ P such that , and |P| is minimized. We propose an efficient pseudo-polynomialtime heuristic for this problem based on novel insights. Our heuristic gives a framework that consists of several components, providing a roadmap for continuing development of better heuristics. Through experimental studies on both simulated and transcript assembly instances, we show that our algorithm significantly improves the previous state-of-the-art algorithm. Implementation of our algorithm is available at https://github.com/Kingsford-Group/catfish.

[1]  Martin Skutella,et al.  The k-Splittable Flow Problem , 2005, Algorithmica.

[2]  Gregory R. Grant,et al.  Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from RNA-seq data , 2015, Bioinform..

[3]  Steven J. M. Jones,et al.  De novo assembly and analysis of RNA-seq data , 2010, Nature Methods.

[4]  Mihai Pop,et al.  Genome assembly reborn: recent computational challenges , 2009, Briefings Bioinform..

[5]  R. Guigó,et al.  Modelling and simulating generic RNA-Seq experiments with the flux simulator , 2012, Nucleic acids research.

[6]  Stephen C. Ekker,et al.  Mojo Hand, a TALEN design tool for genome editing applications , 2013, BMC Bioinformatics.

[7]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[8]  Faraz Hach,et al.  CLIIQ: Accurate Comparative Detection and Quantification of Expressed Isoforms in a Population , 2012, WABI.

[9]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[10]  Tao Jiang,et al.  Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads , 2012, Bioinform..

[11]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature biotechnology.

[12]  Steven L Salzberg,et al.  HISAT: a fast spliced aligner with low memory requirements , 2015, Nature Methods.

[13]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[14]  Brendan Mumey,et al.  Parity Balancing Path Flow Decomposition and Routing , 2015, 2015 IEEE Globecom Workshops (GC Wkshps).

[15]  Martin Vingron,et al.  Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels , 2012, Bioinform..

[16]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[17]  Tao Jiang,et al.  IsoLasso: A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly - (Extended Abstract) , 2011, RECOMB.

[18]  Alexandru I. Tomescu,et al.  A novel min-cost flow method for estimating transcript expression with RNA-Seq , 2013, BMC Bioinformatics.

[19]  Decomposition , 1902, The Indian medical gazette.

[20]  Gunnar Rätsch,et al.  MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples , 2013, Bioinform..

[21]  Haim Kaplan,et al.  How to split a flow? , 2012, 2012 Proceedings IEEE INFOCOM.

[22]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[23]  S. Salzberg,et al.  StringTie enables improved reconstruction of a transcriptome from RNA-seq reads , 2015, Nature Biotechnology.

[24]  G. Sherlock,et al.  Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads , 2010, BMC Genomics.

[25]  J. Harrow,et al.  Assessment of transcript reconstruction methods for RNA-seq , 2013, Nature Methods.

[26]  Xun Xu,et al.  SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads , 2013, Bioinform..

[27]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[28]  James B. Brown,et al.  Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation , 2011, Proceedings of the National Academy of Sciences.

[29]  Geet Duggal,et al.  Salmon: Accurate, Versatile and Ultrafast Quantification from RNA-seq Data using Lightweight-Alignment , 2015 .

[30]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature Biotechnology.

[31]  Orion J. Buske,et al.  iReckon: Simultaneous isoform discovery and abundance estimation from RNA-seq data , 2013, Genome research.

[32]  Philippe Chrétienne,et al.  Simple bounds and greedy algorithms for decomposing a flow into a minimal set of paths , 2008, Eur. J. Oper. Res..

[33]  N. Friedman,et al.  Trinity : reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2016 .

[34]  Anne de Jong,et al.  Adaptation of Hansenula polymorpha to methanol: a transcriptome analysis , 2010, BMC Genomics.