SplicingCompass: differential splicing detection using RNA-Seq data

MOTIVATION Alternative splicing is central for cellular processes and substantially increases transcriptome and proteome diversity. Aberrant splicing events often have pathological consequences and are associated with various diseases and cancer types. The emergence of next-generation RNA sequencing (RNA-seq) provides an exciting new technology to analyse alternative splicing on a large scale. However, algorithms that enable the analysis of alternative splicing from short-read sequencing are not fully established yet and there are still no standard solutions available for a variety of data analysis tasks. RESULTS We present a new method and software to predict genes that are differentially spliced between two different conditions using RNA-seq data. Our method uses geometric angles between the high dimensional vectors of exon read counts. With this, differential splicing can be detected even if the splicing events are composed of higher complexity and involve previously unknown splicing patterns. We applied our approach to two case studies including neuroblastoma tumour data with favourable and unfavourable clinical courses. We show the validity of our predictions as well as the applicability of our method in the context of patient clustering. We verified our predictions by several methods including simulated experiments and complementary in silico analyses. We found a significant number of exons with specific regulatory splicing factor motifs for predicted genes and a substantial number of publications linking those genes to alternative splicing. Furthermore, we could successfully exploit splicing information to cluster tissues and patients. Finally, we found additional evidence of splicing diversity for many predicted genes in normalized read coverage plots and in reads that span exon-exon junctions. AVAILABILITY SplicingCompass is licensed under the GNU GPL and freely available as a package in the statistical language R at http://www.ichip.de/software/SplicingCompass.html

[1]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature biotechnology.

[2]  N. Friedman,et al.  Comprehensive comparative analysis of strand-specific RNA sequencing methods , 2010, Nature Methods.

[3]  Tom Maniatis,et al.  Identification of long-range regulatory elements in the protocadherin-α gene cluster , 2006, Proceedings of the National Academy of Sciences.

[4]  M. Stephens,et al.  Sex-specific and lineage-specific alternative splicing in primates. , 2010, Genome research.

[5]  Changkyu Oh,et al.  Disruption of the Pelota Gene Causes Early Embryonic Lethality and Defects in Cell Cycle Progression , 2003, Molecular and Cellular Biology.

[6]  Troels Z. Kristiansen,et al.  Cloning of a novel phosphotyrosine binding domain containing molecule, Odin, involved in signaling by receptor tyrosine kinases , 2002, Oncogene.

[7]  Ru Wei,et al.  The M2 splice isoform of pyruvate kinase is important for cancer metabolism and tumour growth , 2008, Nature.

[8]  Bosiljka Tasic,et al.  Identification of long-range regulatory elements in the protocadherin-alpha gene cluster. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[9]  F. Berthold,et al.  Revisions of the international criteria for neuroblastoma diagnosis, staging and response to treatment. , 1993, Progress in clinical and biological research.

[10]  Fumio Nakamura,et al.  Cartilage Acidic Protein–1B (LOTUS), an Endogenous Nogo Receptor Antagonist for Axon Tract Formation , 2011, Science.

[11]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[12]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[13]  Colin N. Dewey,et al.  RNA-Seq gene expression estimation with read mapping uncertainty , 2009, Bioinform..

[14]  Gil Ast,et al.  Alternative splicing and disease , 2008, RNA biology.

[15]  Catherine H. Wu,et al.  Human RhoGAP domain‐containing proteins: structure, function and evolutionary relationships , 2002, FEBS letters.

[16]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature Biotechnology.

[17]  Marcel H. Schulz,et al.  A Global View of Gene Activity and Alternative Splicing by Deep Sequencing of the Human Transcriptome , 2008, Science.

[18]  Zev A. Binder,et al.  Abrogation of PIK3CA or PIK3R1 reduces proliferation, migration, and invasion in glioblastoma multiforme cells , 2011, Oncotarget.

[19]  P. Herrlich,et al.  Signal-dependent regulation of splicing via phosphorylation of Sam68 , 2002, Nature.

[20]  A. Kornblihtt,et al.  Multiple links between transcription and splicing. , 2004, RNA.

[21]  Jonathan M. Mudge,et al.  The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. , 2009, Genome research.

[22]  Patrick Warnat,et al.  Customized oligonucleotide microarray gene expression-based classification of neuroblastoma patients outperforms current clinical risk stratification. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[23]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[24]  Sherif Abou Elela,et al.  Cancer-associated regulation of alternative splicing , 2009, Nature Structural &Molecular Biology.

[25]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[26]  W. Richter,et al.  Chondrocyte expressed protein-68 (CEP-68), a novel human marker gene for cultured chondrocytes. , 2001, The Biochemical journal.

[27]  W. Richter,et al.  Chondrocyte secreted CRTAC1: a glycosylated extracellular matrix molecule of human articular cartilage. , 2007, Matrix biology : journal of the International Society for Matrix Biology.

[28]  A. Ben-Hur,et al.  METHOD Open Access , 2014 .

[29]  M. L. Rothofsky,et al.  CROC-1 encodes a protein which mediates transcriptional activation of the human FOS promoter. , 1997, Gene.

[30]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[31]  J. Venables Aberrant and Alternative Splicing in Cancer , 2004, Cancer Research.

[32]  Brian P. Brunk,et al.  Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM) , 2011, Bioinform..

[33]  Cole Trapnell,et al.  Computational methods for transcriptome annotation and quantification using RNA-seq , 2011, Nature Methods.

[34]  G. Ast,et al.  Different levels of alternative splicing among eukaryotes , 2006, Nucleic acids research.

[35]  Johanne Toutant,et al.  Heterogeneous Nuclear Ribonucleoprotein K Represses the Production of Pro-apoptotic Bcl-xS Splice Isoform , 2009, The Journal of Biological Chemistry.

[36]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[37]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[38]  F. Clark,et al.  Understanding alternative splicing: towards a cellular code , 2005, Nature Reviews Molecular Cell Biology.

[39]  A. Naranjo,et al.  miRNA Expression Profiling Enables Risk Stratification in Archived and Fresh Neuroblastoma Tumor Samples , 2011, Clinical Cancer Research.

[40]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[41]  M. Schwab Functions of Nogo proteins and their receptors in the nervous system , 2010, Nature Reviews Neuroscience.

[42]  M. Sultan,et al.  Transcriptome Splicing by Deep Sequencing of the Human A Global View of Gene Activity and Alternative , 2008 .

[43]  F. Piva,et al.  SpliceAid 2: A database of human splicing factors expression data and RNA target motifs , 2012, Human mutation.

[44]  Gil Ast,et al.  Insights into the connection between cancer and alternative splicing. , 2008, Trends in genetics : TIG.

[45]  Hiromu Ito,et al.  The carboxyl-terminal region of Crtac1B/LOTUS acts as a functional domain in endogenous antagonism to Nogo receptor-1. , 2012, Biochemical and biophysical research communications.

[46]  T. Maniatis,et al.  A Striking Organization of a Large Family of Human Neural Cadherin-like Cell Adhesion Genes , 1999, Cell.

[47]  W. Huber,et al.  Detecting differential usage of exons from RNA-seq data , 2012, Genome research.

[48]  Ewan Birney,et al.  Assemblies: the good, the bad, the ugly , 2010, Nature Methods.

[49]  Stuart L. Schreiber,et al.  Structure of the Pl3K SH3 domain and analysis of the SH3 family , 1993, Cell.

[50]  Zhong Wang,et al.  Next-generation transcriptome assembly , 2011, Nature Reviews Genetics.

[51]  Xiang-Dong Fu,et al.  Splicing oncogenes , 2007, Nature Structural &Molecular Biology.

[52]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[53]  Y. Xing,et al.  Detection of splice junctions from paired-end RNA-seq data by SpliceMap , 2010, Nucleic acids research.

[54]  Marie-France Sagot,et al.  Theme: Computational Biology and Bioinformatics Computational Sciences for Biology, Medicine and the Environment , 2012 .