Vicinal: a method for the determination of ncRNA ends using chimeric reads from RNA-seq experiments

Non-coding (nc)RNAs are important structural and regulatory molecules. Accurate determination of the primary sequence and secondary structure of ncRNAs is important for understanding their functions. During cDNA synthesis, RNA 3′ end stem-loops can self-prime reverse transcription, creating RNA–cDNA chimeras. We found that chimeric RNA–cDNA fragments can also be detected at 5′ end stem-loops, although at much lower frequency. Using the Gubler–Hoffman method, both types of chimeric fragments can be converted to cDNA during library construction, and they are readily detectable in high-throughput RNA sequencing (RNA-seq) experiments. Here, we show that these chimeric reads contain valuable information about the boundaries of ncRNAs. We developed a bioinformatic method, called Vicinal, to precisely map the ends of numerous fruitfly, mouse and human ncRNAs. Using this method, we analyzed chimeric reads from over 100 RNA-seq datasets, the results of which we make available for users to find RNAs of interest. In summary, we show that Vicinal is a useful tool for determination of the precise boundaries of uncharacterized ncRNAs, facilitating further structure/function studies.

[1]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[2]  L. Grøntved,et al.  eRNAs promote transcription by establishing chromatin accessibility at defined genomic loci. , 2013, Molecular cell.

[3]  P. Fabisch,et al.  Nucleotide sequence of the self-priming 3' terminus of the single-stranded DNA extracted from the parvovirus Kilham rat virus , 1979, Journal of virology.

[4]  P. Stadler,et al.  LocARNA-P: accurate boundary prediction and improved detection of structural RNAs. , 2012, RNA.

[5]  J. Mattick,et al.  Non-coding RNA. , 2006, Human molecular genetics.

[6]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[7]  John S Mattick,et al.  Identification of novel non-coding RNAs using profiles of short sequence reads from next generation sequencing data , 2010, BMC Genomics.

[8]  RIP-seq analysis of eukaryotic Sm proteins identifies three major categories of Sm-containing ribonucleoproteins , 2014, Genome Biology.

[9]  R. Terns,et al.  Non-coding RNAs: lessons from the small nuclear and small nucleolar RNAs , 2007, Nature Reviews Molecular Cell Biology.

[10]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[11]  Yann Ponty,et al.  VARNA: Interactive drawing and editing of the RNA secondary structure , 2009, Bioinform..

[12]  R. Takeuchi,et al.  TRF4 Is Involved in Polyadenylation of snRNAs in Drosophila melanogaster , 2008, Molecular and Cellular Biology.

[13]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[14]  Y. Ruan,et al.  Genome wide full-length transcript analysis using 5' and 3' paired-end-tag next generation sequencing (RNA-PET). , 2012, Methods in molecular biology.

[15]  A. Shilatifard,et al.  The little elongation complex regulates small nuclear RNA transcription. , 2011, Molecular cell.

[16]  David Haussler,et al.  Identification and Classification of Conserved RNA Secondary Structures in the Human Genome , 2006, PLoS Comput. Biol..

[17]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[18]  D. Ward,et al.  DNA of minute virus of mice: self-priming, nonpermuted, single-stranded genome with a 5'-terminal hairpin duplex , 1976, Journal of virology.

[19]  M. Frohman 5'-End cDNA Amplification Using Classic RACE. , 2006, CSH protocols.

[20]  K. Livak,et al.  Real-time quantification of microRNAs by stem–loop RT–PCR , 2005, Nucleic acids research.

[21]  Michael Zuker,et al.  UNAFold: software for nucleic acid folding and hybridization. , 2008, Methods in molecular biology.

[22]  Stephen M. Mount,et al.  Pseudogenes for human small nuclear RNA U3 appear to arise by integration of self-primed reverse transcripts of the RNA into new chromosomal sites , 1983, Cell.

[23]  Piero Carninci,et al.  5′ end–centered expression profiling using cap-analysis gene expression and next-generation sequencing , 2012, Nature Protocols.

[24]  Peter Walter,et al.  Signal recognition particle contains a 7S RNA essential for protein translocation across the endoplasmic reticulum , 1982, Nature.

[25]  Qiang Zhou,et al.  The 7SK small nuclear RNA inhibits the CDK9/cyclin T1 kinase to control transcription , 2001, Nature.

[26]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[27]  Peter J. Bickel,et al.  The Developmental Transcriptome of Drosophila melanogaster , 2010, Nature.

[28]  Alexander S. Garruss,et al.  The little elongation complex functions at initiation and elongation phases of snRNA gene transcription. , 2013, Molecular cell.

[29]  James W. Brown,et al.  Structural implications of novel diversity in eucaryal RNase P RNA. , 2005, RNA.

[30]  N. Thomson,et al.  Studying bacterial transcriptomes using RNA-seq , 2010, Current opinion in microbiology.

[31]  J. Klingström,et al.  Self-priming of reverse transcriptase impairs strand-specific detection of dengue virus RNA. , 2010, The Journal of general virology.

[32]  J. Gall,et al.  Novel small Cajal-body-specific RNAs identified in Drosophila: probing guide RNA function , 2013, RNA.

[33]  Rolf Backofen,et al.  Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering , 2007, PLoS Comput. Biol..

[34]  B. Hoffman,et al.  A simple and very efficient method for generating cDNA libraries. , 1983, Gene.

[35]  M. Kiefmann,et al.  Inhibitory effect of naked neural BC1 RNA or BC200 RNA on eukaryotic in vitro translation systems is reversed by poly(A)-binding protein (PABP). , 2005, Journal of molecular biology.

[36]  Stephen M. Mount,et al.  Drosophila melanogaster genes for U1 snRNA variants and their expression during development. , 1990, Nucleic acids research.

[37]  Alexander S. Garruss,et al.  The Mll2 branch of the COMPASS family regulates bivalent promoters in mouse embryonic stem cells , 2013, Nature Structural &Molecular Biology.

[38]  M. Frohman 3'-End cDNA Amplification Using Classic RACE. , 2006, CSH protocols.