RNAcmap: A Fully Automatic Method for Predicting Contact Maps of RNAs by Evolutionary Coupling Analysis

Motivation The accuracy of RNA secondary and tertiary structure prediction can be significantly improved by using structural restraints derived from evolutionary or direct coupling analysis. Currently, these coupling analyses relied on manually curated multiple sequence alignments collected in the Rfam database, which contains 3016 families. By comparison, millions of non-coding RNA sequences are known. Here, we established RNAcmap, a fully automatic method that enables evolutionary coupling analysis for any RNA sequences. The homology search was based on the covariance model built by Infernal according to two secondary structure predictors: a folding-based algorithm RNAfold and the latest deep-learning method SPOT-RNA. Results We show that the performance of RNAcmap is less dependent on the specific evolutionary coupling tool but is more dependent on the accuracy of secondary structure predictor with the best performance given by RNAcmap (SPOT-RNA). The performance of RNAcmap (SPOT-RNA) is comparable to that based on Rfam-supplied alignment and consistent for those sequences that are not in Rfam collections. Further improvement can be made with a simple meta predictor RNAcmap (SPOT-RNA/RNAfold) depending on which secondary structure predictor can find more homologous sequences. Reliable base-pairing information generated from RNAcmap, for RNAs with high effective homologous sequences, in particular, will be useful for aiding RNA structure prediction. Availability and implementation RNAcmap is available as a web server at https://sparks-lab.org/server/rnacmap/) and as a standalone application along with the datasets at https://github.com/sparks-lab-org/RNAcmap.

[1]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[2]  Lisa N Kinch,et al.  Evaluation of free modeling targets in CASP11 and ROLL , 2016, Proteins.

[3]  Dezhong Deng,et al.  bpRNA: large-scale automated annotation and analysis of RNA secondary structure , 2018, bioRxiv.

[4]  S. Phinn,et al.  Australian vegetated coastal ecosystems as global hotspots for climate change mitigation , 2019, Nature Communications.

[5]  Arne Elofsson,et al.  Estimation of model accuracy in CASP13 , 2019, Proteins.

[6]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[7]  Yaoqi Zhou,et al.  Accurate inference of the full base-pairing structure of RNA by deep mutational scanning and covariation-induced deviation of activity , 2019, bioRxiv.

[8]  Yaoqi Zhou,et al.  RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning , 2019, Nature Communications.

[9]  Jian Wang,et al.  Optimization of RNA 3D structure prediction using evolutionary restraints of nucleotide–nucleotide interactions from direct coupling analysis , 2017, Nucleic acids research.

[10]  Hsien-Da Huang,et al.  RNAcentral: an international database of ncRNA sequences , 2014, Nucleic Acids Res..

[11]  Alex Bateman,et al.  RNAcentral: a hub of information for non-coding RNA sequences , 2018, Nucleic Acids Res..

[12]  Feng Ding,et al.  RNA-Puzzles Round II: assessment of RNA structure prediction programs applied to three large RNA structures , 2015, RNA.

[13]  Yaoqi Zhou,et al.  Getting to Know Your Neighbor: Protein Structure Prediction Comes of Age with Contextual Machine Learning , 2020, J. Comput. Biol..

[14]  J. Wang,et al.  9. Genome-Wide Search for Pseudoknotted Noncoding RNA: A Comparative Study , 2015 .

[15]  Yaoqi Zhou,et al.  B‐factor profile prediction for RNA flexibility using support vector machines , 2018, J. Comput. Chem..

[16]  Sebastian Will,et al.  RNAalifold: improved consensus structure prediction for RNA alignments , 2008, BMC Bioinformatics.

[17]  P. Stadler,et al.  The tedious task of finding homologous noncoding RNA genes. , 2009, RNA.

[18]  Simona Cocco,et al.  Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction , 2015, Nucleic acids research.

[19]  Fabrizio Pucci,et al.  Evaluating DCA-based method performances for RNA contact prediction by a well-curated data set , 2019, bioRxiv.

[20]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[21]  Qi Wu,et al.  Enhanced prediction of RNA solvent accessibility with long short-term memory neural networks and improved sequence profiles , 2018, Bioinform..

[22]  Matteo Dal Peraro,et al.  A further leap of improvement in tertiary structure prediction in CASP13 prompts new routes for future assessments , 2019, Proteins.

[23]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[24]  Adam J. Riesselman,et al.  3D RNA and Functional Interactions from Evolutionary Couplings , 2015, Cell.

[25]  Feng Ding,et al.  RNA-Puzzles: a CASP-like evaluation of RNA three-dimensional structure prediction. , 2012, RNA.

[26]  Sean R. Eddy,et al.  Infernal 1.1: 100-fold faster RNA homology searches , 2013, Bioinform..

[27]  Yuedong Yang,et al.  Genome-scale characterization of RNA tertiary structures and their functional impact by RNA solvent accessibility prediction , 2017, RNA.

[28]  Jonathan P. Bollback,et al.  Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA. , 2006, Genome research.

[29]  Sarah Geisler,et al.  RNA in unexpected places: long non-coding RNA functions in diverse cellular contexts , 2013, Nature Reviews Molecular Cell Biology.

[30]  Andriy Kryshtafovych,et al.  Assessment of contact predictions in CASP12: Co‐evolution and deep learning coming of age , 2017, Proteins.

[31]  Russell L. Malmberg,et al.  Fast and accurate search for non-coding RNA pseudoknot structures in genomes , 2008, Bioinform..

[32]  Robert D. Finn,et al.  Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families , 2017, Nucleic Acids Res..

[33]  W. Olson,et al.  3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures. , 2003, Nucleic acids research.

[34]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[35]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[36]  Katarzyna J Purzycka,et al.  RNA-Puzzles Round III: 3D RNA structure prediction of five riboswitches and one ribozyme. , 2017, RNA.

[37]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[38]  M. Arenas,et al.  Pattern Recognition in Computational Molecular Biology , 2016 .