The maximum similarity partitioning problem and its application in the transcriptome reconstruction and quantification problem

Reconstruct and quantify the RNA molecules in a cell at a given moment is an important problem in molecular biology that allows one to know which genes are being expressed and at which intensity level. Such problem is known as transcriptome reconstruction and quantification problem TRQP. Although several approaches were already designed for the TRQP, none of them model it as a combinatorial optimisation problem over strings. In order to narrow this gap, we present here a new combinatorial optimisation problem called maximum similarity partitioning problem MSPP that models the TRQP. In addition, we prove that the MSPP is NP-complete in the strong sense and present a greedy heuristic for it and some experimental results.

[1]  S. Salzberg,et al.  StringTie enables improved reconstruction of a transcriptome from RNA-seq reads , 2015, Nature Biotechnology.

[2]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[3]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature Biotechnology.

[4]  Orion J. Buske,et al.  iReckon: Simultaneous isoform discovery and abundance estimation from RNA-seq data , 2013, Genome research.

[5]  Alfonso Valencia,et al.  APPRIS: annotation of principal and alternative splice isoforms , 2012, Nucleic Acids Res..

[6]  R. Guigó,et al.  Modelling and simulating generic RNA-Seq experiments with the flux simulator , 2012, Nucleic acids research.

[7]  Pavel A. Pevzner,et al.  Computational molecular biology : an algorithmic approach , 2000 .

[8]  Sahar Al Seesi,et al.  Transcriptome assembly and quantification from Ion Torrent RNA-Seq data , 2013, BMC Genomics.

[9]  TieLiu Shi,et al.  Comparative study of de novo assembly and genome-guided assembly strategies for transcriptome reconstruction based on RNA-Seq , 2013, Science China Life Sciences.

[10]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[11]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[12]  Xiuzhen Huang,et al.  Bridger: a new framework for de novo transcriptome assembly using RNA-seq data , 2015, Genome Biology.

[13]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[14]  Gunnar Rätsch,et al.  MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples , 2013, Bioinform..

[15]  James B. Brown,et al.  Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation , 2011, Proceedings of the National Academy of Sciences.

[16]  Martin Vingron,et al.  Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels , 2012, Bioinform..

[17]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[18]  G. Sherlock,et al.  Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads , 2010, BMC Genomics.

[19]  Zhong Wang,et al.  Next-generation transcriptome assembly , 2011, Nature Reviews Genetics.