Graph regularized, semi-supervised learning improves annotation of de novo transcriptomes

We present a new method, GRASS, for improving an initial annotation of de novo transcriptomes. GRASS makes the shared-sequence relationships between assembled contigs explicit in the form of a graph, and applies an algorithm that performs label propagation to transfer annotations between related contigs and modifies the graph topology iteratively. We demonstrate that GRASS increases the completeness and accuracy of the initial annotation, allows for improved differential analysis, and is very efficient, typically taking 10s of minutes.

[1]  M. Robinson,et al.  Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences , 2015, F1000Research.

[2]  Peng Cui,et al.  Dynamic regulation of genome-wide pre-mRNA splicing and stress tolerance by the Sm-like protein LSm5 in Arabidopsis , 2014, Genome Biology.

[3]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[4]  Anthony T Papenfuss,et al.  Analysis of the platypus genome suggests a transposon origin for mammalian imprinting , 2009, Genome Biology.

[5]  Geet Duggal,et al.  Salmon provides accurate, fast, and bias-aware transcript expression estimates using dual-phase inference , 2015, bioRxiv.

[6]  J. Ayers,et al.  De novo transcriptome assembly for the lobster Homarus americanus and characterization of differential gene expression across nervous system tissues , 2016, BMC Genomics.

[7]  Huanming Yang,et al.  Deep RNA sequencing at single base-pair resolution reveals high complexity of the rice transcriptome. , 2010, Genome research.

[8]  Partha Pratim Talukdar,et al.  Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition , 2010, ACL.

[9]  Cole Trapnell,et al.  Computational methods for transcriptome annotation and quantification using RNA-seq , 2011, Nature Methods.

[10]  A. Oshlack,et al.  Corset: enabling differential gene expression analysis for de novo assembled transcriptomes , 2014, Genome Biology.

[11]  Anne de Jong,et al.  Adaptation of Hansenula polymorpha to methanol: a transcriptome analysis , 2010, BMC Genomics.

[12]  Shankar Kumar,et al.  Video suggestion and discovery for youtube: taking random walks through the view graph , 2008, WWW.

[13]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[14]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[15]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[16]  David G Hendrickson,et al.  Differential analysis of gene regulation at transcript resolution with RNA-seq , 2012, Nature Biotechnology.

[17]  Guiming Liu,et al.  Characterization of Common Carp Transcriptome: Sequencing, De Novo Assembly, Annotation and Comparative Genomics , 2012, PloS one.

[18]  J. Galindo,et al.  Applications of next generation sequencing in molecular ecology of non-model organisms , 2011, Heredity.

[19]  Robert Patro,et al.  Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms , 2013, ArXiv.

[20]  M. Robinson,et al.  Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. , 2015, F1000Research.

[21]  Dennis C. Friedrich,et al.  A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries , 2011, Genome Biology.

[22]  L. Coin,et al.  Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads , 2011, Genome Biology.

[23]  K. Jiang,et al.  Transcriptome Sequencing, De Novo Assembly and Differential Gene Expression Analysis of the Early Development of Acipenser baeri , 2015, PloS one.

[24]  Colin N. Dewey,et al.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis , 2013, Nature Protocols.

[25]  Rob Patro,et al.  Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms , 2013, Nature Biotechnology.

[26]  Johan A. Grahnen,et al.  Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery , 2010, BMC Genomics.

[27]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[28]  David Sankoff,et al.  Locating rearrangement events in a phylogeny based on highly fragmented assemblies , 2016, BMC Genomics.

[29]  Robert Patro,et al.  RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes , 2015, bioRxiv.