Visualisation and analysis of RNA-Seq assembly graphs

RNA-sequencing (RNA-Seq) is a powerful transcriptome profiling technology enabling transcript discovery and quantification. RNA-Seq data are large, and most commonly used as a source of genelevel quantification measurements, whilst the underlying assemblies of reads, if inspected, are usually viewed as sequence reads mapped on to a reference genome. Whilst sufficient for many needs, when the underlying transcript assemblies are complex, this visualisation approach can be limiting; errors in assembly can be difficult to spot and interpretation of splicing events is challenging. Here we report on the development of a graph-based visualisation method as a complementary approach to understanding transcript diversity and read assembly from short-read RNA-Seq data. Following the mapping of reads to the reference genome, read-to-read comparison is performed on all reads mapping to a given gene, producing a matrix of weighted similarity scores between reads. This is used to produce an RNA assembly graph where nodes represent reads derived from a cDNA and edges similarity scores between reads, above a defined threshold. Visualisation of resulting graphs is performed using Graphia Professional. This tool can render the often large and complex graph topologies that result from DNA/RNA sequence assembly in 3D space and supports info rmatio no verlay on to nodes, e.g. transcript models. We have also implemented an analysis pipeline for the creation of RNA assembly graphs with both a command-line and web-based interface that allows users to create and visualise these data. Here we demonstrate the utility of this approach on RNA-Seq data, including the unusual structure of these graphs and how they can be used to identify issues in assembly, repetitive sequences within transcripts and splice variants. We believe this approach has the potential to significantly improve our understanding of transcript complexity.

[1]  J. Nielsen,et al.  Analysis of the Human Tissue-specific Expression by Genome-wide Integration of Transcriptomics and Antibody-based Proteomics* , 2013, Molecular & Cellular Proteomics.

[2]  M. Marra,et al.  Applications of next-generation sequencing technologies in functional genomics. , 2008, Genomics.

[3]  Vincent Lacroix,et al.  Complementarity of assembly-first and mapping-first approaches for alternative splicing annotation and differential analysis from RNAseq data , 2018, Scientific Reports.

[4]  Tim Angus,et al.  Modelling the Structure and Dynamics of Biological Pathways , 2016, PLoS biology.

[5]  S. Perry Vertebrate tropomyosin: distribution, properties and function , 2004, Journal of Muscle Research & Cell Motility.

[6]  Fangqing Zhao,et al.  Detection, annotation and visualization of alternative splicing from RNA-Seq data with SplicingViewer. , 2012, Genomics.

[7]  Maode Lai,et al.  TSVdb: a web-tool for TCGA splicing variants analysis , 2018, BMC Genomics.

[8]  Stijn van Dongen,et al.  Construction, Visualisation, and Clustering of Transcription Networks from Microarray Expression Data , 2007, PLoS Comput. Biol..

[9]  Anton J. Enright,et al.  BioLayout-an automatic graph layout algorithm for similarity visualization , 2001, Bioinform..

[10]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[11]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[12]  W. Huber,et al.  Detecting differential usage of exons from RNA-seq data , 2012, Genome research.

[13]  Quantitative visualization of alternative exon expression from RNA-seq data , 2015, Bioinform..

[14]  B. Faircloth,et al.  Primer3—new capabilities and interfaces , 2012, Nucleic acids research.

[15]  M. Takagi,et al.  Perichromosomal protein Ki67 supports mitotic chromosome architecture , 2016, Genes to cells : devoted to molecular & cellular mechanisms.

[16]  Edward T Kipreos,et al.  CRL2(LRR-1) targets a CDK inhibitor for cell cycle control in C. elegans and actin-based motility regulation in human cells. , 2010, Developmental cell.

[17]  James C. Mullikin,et al.  Detection and visualization of differential splicing in RNA-Seq data with JunctionSeq , 2015, Nucleic acids research.

[18]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[19]  Boyan Zhang,et al.  PCM1 recruits Plk1 to the pericentriolar matrix to promote primary cilia disassembly before mitotic entry , 2013, Journal of Cell Science.

[20]  Michael Jünger,et al.  Drawing Large Graphs with a Potential-Field-Based Multilevel Algorithm , 2004, GD.

[21]  Ralf Zimmer,et al.  Manananggal - a novel viewer for alternative splicing events , 2017, BMC Bioinformatics.

[22]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[23]  Alexander Dobin,et al.  Mapping RNA‐seq Reads with STAR , 2015, Current protocols in bioinformatics.

[24]  Stefan Schuster,et al.  Alternative splicing of mutually exclusive exons - A review , 2013, Biosyst..

[25]  Juliana Costa-Silva,et al.  RNA-Seq differential expression analysis: An extended review and a software tool , 2017, PloS one.

[26]  Lan Lin,et al.  rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data , 2014, Proceedings of the National Academy of Sciences.

[27]  Anton J. Enright,et al.  Network visualization and analysis of gene expression data using BioLayout Express3D , 2009, Nature Protocols.

[28]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[29]  Justin Zobel,et al.  Bandage: interactive visualization of de novo genome assemblies , 2015, bioRxiv.

[30]  Gabor T. Marth,et al.  EagleView: a genome assembly viewer for next-generation sequencing technologies. , 2008, Genome research.

[31]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[32]  Stefan Wiemann,et al.  Assembly of a parts list of the human mitotic cell cycle machinery , 2018, bioRxiv.

[33]  Huidong Shi,et al.  A survey of computational methods in transcriptome-wide alternative splicing analysis , 2015, Biomolecular concepts.

[34]  Matthew Berriman,et al.  BamView: visualizing and interpretation of next-generation sequencing read alignments , 2012, Briefings Bioinform..

[35]  C. Vogel,et al.  Computational challenges, tools, and resources for analyzing co‐ and post‐transcriptional events in high throughput , 2015, Wiley interdisciplinary reviews. RNA.

[36]  Yasushi Tomita,et al.  CENP-O, a Protein Localized at the Centromere Throughout the Cell Cycle, Is a Novel Target Antigen in Systemic Sclerosis , 2009, The Journal of Rheumatology.

[37]  Kim Rutherford,et al.  Artemis: sequence visualization and annotation , 2000, Bioinform..

[38]  In Seok Yang,et al.  Analysis of Whole Transcriptome Sequencing Data: Workflow and Software , 2015, Genomics & informatics.

[39]  Alejandro A. Schäffer,et al.  Database indexing for production MegaBLAST searches , 2008, Bioinform..

[40]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[41]  Yongsheng Bai,et al.  Graphical Abstract CG 18-3-MS , 2017 .

[42]  Iain Milne,et al.  Tablet: Visualizing Next-Generation Sequence Assemblies and Mappings. , 2016, Methods in molecular biology.

[43]  Steven L Salzberg,et al.  HISAT: a fast spliced aligner with low memory requirements , 2015, Nature Methods.

[44]  Anton J. Enright,et al.  Kraken: A set of tools for quality control and analysis of high-throughput sequence data , 2013, Methods.

[45]  Michael Jünger,et al.  The Open Graph Drawing Framework (OGDF) , 2013, Handbook of Graph Drawing and Visualization.

[46]  Hanspeter Pfister,et al.  Vials: Visualizing Alternative Splicing of Genes , 2016, IEEE Transactions on Visualization and Computer Graphics.

[47]  R. Brooks,et al.  Regulation of the fibroblast cell cycle by serum , 1976, Nature.

[48]  Inanç Birol,et al.  ABySS-Explorer: Visualizing Genome Sequence Assemblies , 2009, IEEE Transactions on Visualization and Computer Graphics.

[49]  Yixing Han,et al.  Advanced Applications of RNA Sequencing and Challenges , 2015, Bioinformatics and biology insights.

[50]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[51]  J. Hooper,et al.  A survey of software for genome-wide discovery of differential splicing in RNA-Seq data , 2014, Human Genomics.

[52]  Robert Gentleman,et al.  Software for Computing and Annotating Genomic Ranges , 2013, PLoS Comput. Biol..

[53]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.