Deep learning predicts short non-coding RNA functions from only raw sequence data

Small non-coding RNAs (ncRNAs) are short non-coding sequences involved in gene regulation in many biological processes and diseases. The lack of a complete comprehension of their biological functionality, especially in a genome-wide scenario, has demanded new computational approaches to annotate their roles. It is widely known that secondary structure is determinant to know RNA function and machine learning based approaches have been successfully proven to predict RNA function from secondary structure information. Here we show that RNA function can be predicted with good accuracy from a lightweight representation of sequence information without the necessity of computing secondary structure features which is computationally expensive. This finding appears to go against the dogma of secondary structure being a key determinant of function in RNA. Compared to recent secondary structure based methods, the proposed solution is more robust to sequence boundary noise and reduces drastically the computational cost allowing for large data volume annotations. Scripts and datasets to reproduce the results of experiments proposed in this study are available at: https://github.com/bioinformatics-sannio/ncrna-deep.

[1]  M. Dehmer,et al.  An Introductory Review of Deep Learning for Prediction Models With Big Data , 2020, Frontiers in Artificial Intelligence.

[2]  Dirk Walther,et al.  Identification and classification of ncRNA molecules using graph properties , 2009, Nucleic acids research.

[3]  D. Hilbert Ueber die stetige Abbildung einer Line auf ein Flächenstück , 1891 .

[4]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[5]  Robert D. Finn,et al.  Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families , 2017, Nucleic Acids Res..

[6]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[7]  Sean R. Eddy,et al.  Infernal 1.1: 100-fold faster RNA homology searches , 2013, Bioinform..

[8]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[9]  François Chollet,et al.  Deep Learning with R , 2018 .

[10]  Pietro Liò,et al.  ncRNA Classification with Graph Convolutional Networks , 2019, ArXiv.

[11]  Fabrizio Costa,et al.  An efficient graph kernel method for non‐coding RNA functional prediction , 2017, Bioinform..

[12]  Vasily Tolkachev,et al.  Know When You Don't Know: A Robust Deep Learning Approach in the Presence of Unknown Phenotypes. , 2018, Assay and drug development technologies.

[13]  Tatsuya Akutsu,et al.  IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming , 2011, Bioinform..

[14]  Antonino Fiannaca,et al.  nRC: non-coding RNA Classifier based on structural features , 2017, BioData Mining.

[15]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[16]  M. Esteller Non-coding RNAs in human disease , 2011, Nature Reviews Genetics.

[17]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[18]  G. Varani,et al.  Decrypting noncoding RNA interactions, structures, and functional networks , 2019, Genome research.

[19]  Michael Bader,et al.  Space-Filling Curves - An Introduction with Applications in Scientific Computing , 2012, Texts in Computational Science and Engineering.

[20]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[21]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[22]  Mononucleotide and dinucleotide frequencies, and codon usage in poliovirion RNA. , 1981, Nucleic acids research.

[23]  J. Mattick,et al.  Non-coding RNA. , 2006, Human molecular genetics.