Novel Bioinformatics Method for Identification of Genome-Wide Non-Canonical Spliced Regions Using RNA-Seq Data

Setting During endoplasmic reticulum (ER) stress, the endoribonuclease (RNase) Ire1α initiates removal of a 26 nt region from the mRNA encoding the transcription factor Xbp1 via an unconventional mechanism (atypically within the cytosol). This causes an open reading frame-shift that leads to altered transcriptional regulation of numerous downstream genes in response to ER stress as part of the unfolded protein response (UPR). Strikingly, other examples of targeted, unconventional splicing of short mRNA regions have yet to be reported. Objective Our goal was to develop an approach to identify non-canonical, possibly very short, splicing regions using RNA-Seq data and apply it to ER stress-induced Ire1α heterozygous and knockout mouse embryonic fibroblast (MEF) cell lines to identify additional Ire1α targets. Results We developed a bioinformatics approach called the Read-Split-Walk (RSW) pipeline, and evaluated it using two Ire1α heterozygous and two Ire1α-null samples. The 26 nt non-canonical splice site in Xbp1 was detected as the top hit by our RSW pipeline in heterozygous samples but not in the negative control Ire1α knockout samples. We compared the Xbp1 results from our approach with results using the alignment program BWA, Bowtie2, STAR, Exonerate and the Unix “grep” command. We then applied our RSW pipeline to RNA-Seq data from the SKBR3 human breast cancer cell line. RSW reported a large number of non-canonical spliced regions for 108 genes in chromosome 17, which were identified by an independent study. Conclusions We conclude that our RSW pipeline is a practical approach for identifying non-canonical splice junction sites on a genome-wide level. We demonstrate that our pipeline can detect novel splice sites in RNA-Seq data generated under similar conditions for multiple species, in our case mouse and human.

[1]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[2]  S. Hanash,et al.  A chromosome-centric human proteome project (C-HPP) to characterize the sets of proteins encoded in chromosome 17. , 2013, Journal of proteome research.

[3]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[4]  J. Derisi,et al.  Genome-scale approaches for discovering novel nonconventional splicing substrates of the Ire1 nuclease , 2004, Genome Biology.

[5]  Hiderou Yoshida,et al.  IRE1-mediated unconventional mRNA splicing and S2P-mediated ATF6 cleavage merge to regulate XBP1 in signaling the unfolded protein response. , 2002, Genes & development.

[6]  G. Omenn,et al.  A first step toward completion of a genome-wide characterization of the human proteome. , 2013, Journal of proteome research.

[7]  M. Katze,et al.  UPR pathways combine to prevent hepatic steatosis caused by ER stress-mediated suppression of transcriptional master regulators. , 2008, Developmental cell.

[8]  J. Dean,et al.  Expression of Cre recombinase in mouse oocytes: A means to study maternal effect genes , 2000, Genesis.

[9]  L. Glimcher,et al.  Endoplasmic Reticulum Stress Links Obesity, Insulin Action, and Type 2 Diabetes , 2004, Science.

[10]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[11]  Kezhong Zhang,et al.  The unfolded protein response transducer IRE1α prevents ER stress‐induced hepatic steatosis , 2011, EMBO Journal.

[12]  T. Iwawaki,et al.  Identification of a consensus element recognized and cleaved by IRE1α , 2010, Nucleic acids research.

[13]  R. Guigó,et al.  Comparison of splice sites in mammals and chicken. , 2005, Genome research.

[14]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[15]  Gary D Bader,et al.  The biology/disease-driven human proteome project (B/D-HPP): enabling protein research for the life sciences community. , 2013, Journal of proteome research.

[16]  K. Mori,et al.  XBP1 mRNA Is Induced by ATF6 and Spliced by IRE1 in Response to ER Stress to Produce a Highly Active Transcription Factor , 2001, Cell.

[17]  M. Borodovsky,et al.  TrueSight: a new algorithm for splice junction detection using RNA-seq , 2012, Nucleic acids research.

[18]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[19]  Alberto Riva,et al.  PASTA: splice junction identification from RNA-Sequencing data , 2013, BMC Bioinformatics.

[20]  Roland Eils,et al.  SplicingCompass: differential splicing detection using RNA-Seq data , 2013, Bioinform..

[21]  R. Schekman,et al.  Bi-directional protein transport between the ER and Golgi. , 2004, Annual review of cell and developmental biology.

[22]  William S. Hancock,et al.  Distinct splice variants and pathway enrichment in the cell-line models of aggressive human breast cancer subtypes. , 2014, Journal of proteome research.

[23]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[24]  R. Kaufman,et al.  The impact of the unfolded protein response on human disease , 2012, The Journal of cell biology.

[25]  D. Scheuner,et al.  Control of mRNA translation preserves endoplasmic reticulum function in beta cells and maintains glucose homeostasis , 2005, Nature Medicine.

[26]  T. Ideker,et al.  Genome wide proteomics of ERBB2 and EGFR and other oncogenic pathways in inflammatory breast cancer. , 2013, Journal of proteome research.

[27]  Yunlong Liu,et al.  Alt Event Finder: a tool for extracting alternative splicing events from RNA-seq data , 2012, BMC Genomics.

[28]  R. Kaufman,et al.  Cytoplasmic IRE1α-mediated XBP1 mRNA Splicing in the Absence of Nuclear Processing and Endoplasmic Reticulum Stress* , 2006, Journal of Biological Chemistry.