Finding sRNA generative locales from high-throughput sequencing data with NiBLS

BackgroundNext-generation sequencing technologies allow researchers to obtain millions of sequence reads in a single experiment. One important use of the technology is the sequencing of small non-coding regulatory RNAs and the identification of the genomic locales from which they originate. Currently, there is a paucity of methods for finding small RNA generative locales.ResultsWe describe and implement an algorithm that can determine small RNA generative locales from high-throughput sequencing data. The algorithm creates a network, or graph, of the small RNAs by creating links between them depending on their proximity on the target genome. For each of the sub-networks in the resulting graph the clustering coefficient, a measure of the interconnectedness of the subnetwork, is used to identify the generative locales. We test the algorithm over a wide range of parameters using RFAM sequences as positive controls and demonstrate that the algorithm has good sensitivity and specificity in a range of Arabidopsis and mouse small RNA sequence sets and that the locales it generates are robust to differences in the choice of parameters.ConclusionsNiBLS is a fast, reliable and sensitive method for determining small RNA locales in high-throughput sequence data that is generally applicable to all classes of small RNA.

[1]  John Quackenbush,et al.  Open source software for the analysis of microarray data. , 2003, BioTechniques.

[2]  Olivier Voinnet,et al.  The diversity of RNA silencing pathways in plants. , 2006, Trends in genetics : TIG.

[3]  Jonathan D. G. Jones,et al.  Application of 'next-generation' sequencing technologies to microbial genetics , 2009, Nature Reviews Microbiology.

[4]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[5]  D. Baulcombe,et al.  PolIVb influences RNA-directed DNA methylation independently of its role in siRNA biogenesis , 2008, Proceedings of the National Academy of Sciences.

[6]  J. Mullikin,et al.  SSAHA: a fast search method for large DNA databases. , 2001, Genome research.

[7]  R. Martienssen,et al.  The role of RNA interference in heterochromatic silencing , 2004, Nature.

[8]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[9]  Christopher M. Player,et al.  Large-Scale Sequencing Reveals 21U-RNAs and Additional MicroRNAs and Endogenous siRNAs in C. elegans , 2006, Cell.

[10]  Vincent Moulton,et al.  A toolkit for analysing large-scale plant small RNA datasets , 2008, Bioinform..

[11]  D. Baulcombe,et al.  miRNAs control gene expression in the single-cell alga Chlamydomonas reinhardtii , 2007, Nature.

[12]  Robert Gentleman,et al.  Graphs in molecular biology , 2007, BMC Bioinformatics.

[13]  Robert Blelloch,et al.  Mouse ES cells express endogenous shRNAs, siRNAs, and other Microprocessor-independent, Dicer-dependent small RNAs. , 2008, Genes & development.

[14]  N. Rajewsky,et al.  Discovering microRNAs from deep sequencing data using miRDeep , 2008, Nature Biotechnology.

[15]  D. Bartel,et al.  A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. , 2006, Genes & development.

[16]  Shu-Hsing Wu,et al.  Bioinformatic prediction and experimental validation of a microRNA-directed tandem trans-acting siRNA cascade in Arabidopsis , 2007, Proceedings of the National Academy of Sciences.

[17]  Anthony J. Bagnall,et al.  Time Series Data Mining Algorithms for Identifying Short RNA in Arabidopsis thaliana , 2008, BIOCOMP.

[18]  D. Baulcombe RNA silencing in plants , 2004, Nature.

[19]  Ruiqiang Li,et al.  SOAP: short oligonucleotide alignment program , 2008, Bioinform..