Genome-Guided Transcriptome Assembly in the Age of Next-Generation Sequencing

Next generation sequencing technologies provide unprecedented power to explore the repertoire of genes and their alternative splice variants, collectively defining the transcriptome of a species in great detail. However, assembling the short reads into full-length gene and transcript models presents significant computational challenges. We review current algorithms for assembling transcripts and genes from next generation sequencing reads aligned to a reference genome, and lay out areas for future improvements.

[1]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature biotechnology.

[2]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[3]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[4]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[5]  Bin Tian,et al.  A large-scale analysis of mRNA polyadenylation of human and mouse genes , 2005, Nucleic acids research.

[6]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature Biotechnology.

[7]  Orion J. Buske,et al.  iReckon: Simultaneous isoform discovery and abundance estimation from RNA-seq data , 2013, Genome research.

[8]  B. Frey,et al.  Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing , 2008, Nature Genetics.

[9]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[10]  M. Borodovsky,et al.  TrueSight: a new algorithm for splice junction detection using RNA-seq , 2012, Nucleic acids research.

[11]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[12]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[13]  Haixu Tang,et al.  Splicing graphs and EST assembly problem , 2002, ISMB.

[14]  Michael Q. Zhang,et al.  OLego: fast and sensitive mapping of spliced mRNA-Seq reads using small seeds , 2013, Nucleic acids research.

[15]  A. Ben-Hur,et al.  METHOD Open Access , 2014 .

[16]  Martin Vingron,et al.  Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels , 2012, Bioinform..

[17]  Zhong Wang,et al.  Next-generation transcriptome assembly , 2011, Nature Reviews Genetics.

[18]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[19]  Steven J. M. Jones,et al.  De novo assembly and analysis of RNA-seq data , 2010, Nature Methods.

[20]  B. Graveley Alternative splicing: increasing diversity in the proteomic world. , 2001, Trends in genetics : TIG.

[21]  G. Sutton,et al.  Gene and alternative splicing annotation with AIR. , 2005, Genome research.

[22]  Derek Y. Chiang,et al.  MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery , 2010, Nucleic acids research.

[23]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[24]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[25]  S. Salzberg Recent advances in RNA sequence analysis , 2010, F1000 biology reports.

[26]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[27]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[28]  Tao Jiang,et al.  IsoLasso: A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly - (Extended Abstract) , 2011, RECOMB.

[29]  L. Feuk,et al.  Total RNA sequencing reveals nascent transcription and widespread co-transcriptional splicing in the human brain , 2011, Nature Structural &Molecular Biology.

[30]  Nadav S. Bar,et al.  Landscape of transcription in human cells , 2012, Nature.

[31]  K. Hansen,et al.  Biases in Illumina transcriptome sequencing caused by random hexamer priming , 2010, Nucleic acids research.

[32]  James B. Brown,et al.  Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation , 2011, Proceedings of the National Academy of Sciences.

[33]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.