Single-sequence and profile-based prediction of RNA solvent accessibility using dilated convolutional neural network

MOTIVATION RNA solvent accessibility, similar to protein solvent accessibility, reflects the structural regions that are accessible to solvents or other functional biomolecules, and plays an important role for structural and functional characterization. Unlike protein solvent accessibility, only a few tools are available for predicting RNA solvent accessibility despite the fact that millions of RNA transcripts have unknown structures and functions. Also, these tools have limited accuracy. Here, we have developed RNAsnap2 that uses a dilated convolutional neural network with a new feature, based on predicted base-pairing probabilities from LinearPartition. RESULTS Using the same training set from the recent predictor RNAsol, RNAsnap2 provides an 11% improvement in median Pearson Correlation Coefficient (PCC) and 9% improvement in mean absolute errors for the same test set of 45 RNA chains. A larger improvement (22% in median PCC) is observed for 31 newly deposited RNA chains that are non-redundant and independent from the training and the test sets. A single-sequence version of RNAsnap2 (i.e. without using sequence profiles generated from homology search by Infernal) has achieved comparable performance to the profile-based RNAsol. In addition, RNAsnap2 has achieved comparable performance for protein-bound and protein-free RNAs. Both RNAsnap2 and RNAsnap2 (SingleSeq) are expected to be useful for searching structural signatures and locating functional regions of non-coding RNAs. AVAILABILITY AND IMPLEMENTATION Standalone-versions of RNAsnap2 and RNAsnap2 (SingleSeq) are available at https://github.com/jaswindersingh2/RNAsnap2. Direct prediction can also be made at https://sparks-lab.org/server/rnasnap2. The datasets used in this research can also be downloaded from the GITHUB and the webserver mentioned above. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  H. Bussemaker,et al.  DSSR: an integrated software tool for dissecting the spatial structure of RNA , 2015, Nucleic acids research.

[2]  Mario Rodríguez,et al.  Synthesis, X-ray diffraction analysis and nonlinear optical properties of hexacoordinated organotin compounds derived from Schiff bases , 2014 .

[3]  Kiyoshi Asai,et al.  Improving the accuracy of predicting secondary structure for aligned RNA sequences , 2010, Nucleic Acids Res..

[4]  Lukasz Jan Kielpinski,et al.  Massive parallel-sequencing-based hydroxyl radical probing of RNA accessibility , 2014, Nucleic acids research.

[5]  D. Turner,et al.  Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. , 1998, Biochemistry.

[6]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[7]  J. Doudna,et al.  Insights into RNA structure and function from genome-wide studies , 2014, Nature Reviews Genetics.

[8]  S H Kim,et al.  Predicting surface exposure of amino acids from protein sequence. , 1990, Protein engineering.

[9]  R. Bahadur,et al.  An account of solvent accessibility in protein-RNA recognition , 2018, Scientific Reports.

[10]  Yaoqi Zhou,et al.  Prediction of One‐Dimensional Structural Properties Of Proteins by Integrated Neural Networks , 2010 .

[11]  T. Cech,et al.  Defining the inside and outside of a catalytic RNA molecule. , 1989, Science.

[12]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[13]  David S. Goodsell,et al.  The RCSB protein data bank: integrative view of protein, gene and 3D structural information , 2016, Nucleic Acids Res..

[14]  Yaoqi Zhou,et al.  Real‐SPINE: An integrated system of neural networks for real‐value prediction of protein structural properties , 2007, Proteins.

[15]  Serafim Batzoglou,et al.  CONTRAfold: RNA secondary structure prediction without physics-based models , 2006, ISMB.

[16]  Rolf Backofen,et al.  Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering , 2007, PLoS Comput. Biol..

[17]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[18]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[19]  Yaoqi Zhou,et al.  Getting to Know Your Neighbor: Protein Structure Prediction Comes of Age with Contextual Machine Learning , 2020, J. Comput. Biol..

[20]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[21]  Katarzyna J Purzycka,et al.  RNA-Puzzles Round III: 3D RNA structure prediction of five riboswitches and one ribozyme. , 2017, RNA.

[22]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[23]  Kiyoshi Asai,et al.  CentroidFold: a web server for RNA secondary structure prediction , 2009, Nucleic Acids Res..

[24]  Xue Ying,et al.  An Overview of Overfitting and its Solutions , 2019, Journal of Physics: Conference Series.

[25]  Jinwei Zhang,et al.  Crystal structure of an adenovirus virus-associated RNA , 2019, Nature Communications.

[26]  Yaoqi Zhou,et al.  Improving prediction of protein secondary structure, backbone angles, solvent accessibility and contact numbers by using predicted contact maps and an ensemble of recurrent and residual convolutional neural networks , 2018, Bioinform..

[27]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[28]  Yaoqi Zhou,et al.  Structural signatures of thermal adaptation of bacterial ribosomal RNA, transfer RNA, and messenger RNA , 2017, PloS one.

[29]  M. Chance,et al.  Probing the structure of ribosome assembly intermediates in vivo using DMS and hydroxyl radical footprinting. , 2016, Methods.

[30]  Dezhong Deng,et al.  bpRNA: large-scale automated annotation and analysis of RNA secondary structure , 2018, bioRxiv.

[31]  L. Scott,et al.  RNA structure determination by NMR. , 2008, Methods in molecular biology.

[32]  Liang Huang,et al.  LinearPartition: linear-time approximation of RNA folding partition function and base-pairing probabilities , 2020, Bioinform..

[33]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[34]  Vladimir Vapnik,et al.  Support-vector networks , 2004, Machine Learning.

[35]  Demis Hassabis,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[36]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[37]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[38]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[39]  Y. Xiong,et al.  Structural Basis for tRNA Mimicry by a Bacterial Y RNA. , 2018, Structure.

[40]  M. Gromiha,et al.  Real value prediction of solvent accessibility from amino acid sequence , 2003, Proteins.

[41]  Yuedong Yang,et al.  Genome-scale characterization of RNA tertiary structures and their functional impact by RNA solvent accessibility prediction , 2017, RNA.

[42]  The RNAcentral Consortium RNAcentral: a comprehensive database of non-coding RNA sequences , 2016, Nucleic Acids Res..

[43]  Kate Smith-Miles Exploratory data analysis , 2011 .

[44]  Manolis Kellis,et al.  Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo , 2013, Nature.

[45]  Sean R. Eddy,et al.  Infernal 1.1: 100-fold faster RNA homology searches , 2013, Bioinform..

[46]  Franca Fraternali,et al.  POPS: a fast algorithm for solvent accessible surface areas at atomic and residue level , 2003, Nucleic Acids Res..

[47]  Lin Huang,et al.  Structure and ligand binding of the glutamine-II riboswitch , 2019, Nucleic acids research.

[48]  Qi Wu,et al.  Enhanced prediction of RNA solvent accessibility with long short-term memory neural networks and improved sequence profiles , 2018, Bioinform..

[49]  Gaurav Sharma,et al.  TurboFold II: RNA structural alignment and secondary structure prediction informed by multiple homologs , 2017, Nucleic acids research.

[50]  Hyo-Eun Kim,et al.  Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks , 2018, NeurIPS.

[51]  Miodrag Lovric,et al.  International Encyclopedia of Statistical Science , 2011 .

[52]  Yaoqi Zhou,et al.  RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning , 2019, Nature Communications.

[53]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.