SURFR: A Real-Time Platform for Non-Coding RNA Fragmentation Analysis Using Wavelets

It is well known that microRNAs (miRNAs or miRs) are small (~18-25 nt) yet highly potent non-coding RNA-derived RNAs (ndRNAs), originating from pre-miRNA fragmentation, that have been shown to alter the post-transcriptional functionality of many messenger RNAs (mRNAs). Biologically, the identification and study of miRNAs is very critical due to their increasing significance as biomarkers for many types of cancers and other genetic diseases. While empirical evidence supporting the existence of several novel ndRNAs excised from other longer non coding RNAs (ncRNAs) is growing, recent evidence suggests the full extent of their prevalence is likely underappreciated. Although some computational methods have been designed to help domain experts identify and understand miRNAs by analyzing Next Generation Sequencing (NGS) datasets, there are some crucial challenges, such as efficiency, effectiveness, and generalizability, in the state-of-the-art in-silico methods. To address such problems, our group proposed a new algorithm to mine ndRNAs by applying wavelet-based signal processing techniques as opposed to the current string-based NGS sequence alignment/analysis. However, due to novelty of the approach, our initial version of the algorithm was focused specifically on mining miRNAs, snoRNA-derived RNAs (sdRNAs) and transfer RNA (tRNA) fragments (tRFs) because of their importance in the literature plus the availability of experimentally validated databases to confirm our findings. Despite the computational issues, we still lack a basic understanding of the existence and the range of ndRNA functionalities from a) ndRNAs other than miRs, sdRNAs & tRFs in humans, and b) all ndRNAs in millions of organisms other than humans. Hence, there is an urgent requirement to automate the extraction and experimentation of ndRNAs, especially considering the rate at which NGS data is being produced. Therefore, in the current article, we extended our algorithm to be applicable to ~500 organisms—including eukaryotes, plants, bacteria, fungi, and protists—along with all their ncRNAs available in the current NCBI annotation. We also constructed a real-time user-friendly platform, SURFR, available at salts.soc.southalabama.edu/surfr, to aid domain experts and the aspiring biomedical scientists to perform RNA-Seq experiments to study ndRNAs. Not only our platform is extremely efficient, but we are also capable of allowing the users to identify, analyze, visualize, and compare ndRNAs from up to 30 NGS files to perform rigorous experimentation. Moreover, access to NGS files from public databases like SRA, and ndRNAs from private databases like TCGA are made readily available to the users to further validate their novel findings. Finally, we provide theoretical validation to examine our platform’s effectiveness.