Evaluating DCA-based method performances for RNA contact prediction by a well-curated data set

RNA molecules play many pivotal roles in the cellular functioning that are still not fully understood. Any detailed understanding of RNA function requires knowledge of its three-dimensional structure, yet experimental RNA structure resolution remains demanding. Recent advances in sequencing provide unprecedented amounts of sequence data that can be statistically analysed by methods such as Direct Coupling Analysis (DCA) to determine spatial proximity or contacts of specific nucleic acid pairs, which improve the quality of structure prediction. To quantify this structure prediction improvement, we here present a well curated dataset of about seventy RNA structures with high resolution and compare different nucleotide-nucleotide contact prediction methods available in the literature. We observe only minor difference between the performances of the different methods. Moreover, we discuss how these predictions are robust for different contact definitions and how strongly depend on procedures used to curate and align the families of homologous RNA sequences.

[1]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[2]  Feng Ding,et al.  RNA-Puzzles: a CASP-like evaluation of RNA three-dimensional structure prediction. , 2012, RNA.

[3]  Sean R. Eddy,et al.  Infernal 1.1: 100-fold faster RNA homology searches , 2013, Bioinform..

[4]  Jinwei Zhang,et al.  Dramatic improvement of crystals of large RNAs by cation replacement and dehydration. , 2014, Structure.

[5]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[6]  Jacek Blazewicz,et al.  Automated RNA 3D Structure Prediction with RNAComposer. , 2016, Methods in molecular biology.

[7]  A. Fire,et al.  Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans , 1998, Nature.

[8]  J. Bujnicki,et al.  ModeRNA: a tool for comparative modeling of RNA 3D structure , 2011, Nucleic acids research.

[9]  Kristian Rother,et al.  RNAmap2D – calculation, visualization and analysis of contact and distance maps for RNA and protein-RNA complex structures , 2012, BMC Bioinformatics.

[10]  Mehari B Zerihun,et al.  pydca v1.0: a comprehensive software for direct coupling analysis of RNA and protein sequences , 2020, Bioinform..

[11]  Adam J. Riesselman,et al.  3D RNA and Functional Interactions from Evolutionary Couplings , 2015, Cell.

[12]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[13]  A. Schug,et al.  Simulation of FRET dyes allows quantitative comparison against experimental data. , 2018, The Journal of chemical physics.

[14]  Fabrizio Pucci,et al.  Shedding light on the dark matter of the biomolecular structural universe: Progress in RNA 3D structure prediction. , 2019, Methods.

[15]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[16]  Katarzyna J Purzycka,et al.  RNA-Puzzles Round III: 3D RNA structure prediction of five riboswitches and one ribozyme. , 2017, RNA.

[17]  Nikolay V. Dokholyan,et al.  iFoldRNA v2: folding RNA with constraints , 2015, Bioinform..

[18]  Magdalena A. Jonikas,et al.  Coarse-grained modeling of large RNA molecules with knowledge-based potentials and structural filters. , 2009, RNA.

[19]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[20]  Robert D. Finn,et al.  Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families , 2017, Nucleic Acids Res..

[21]  F. Ding,et al.  Ab initio RNA folding by discrete molecular dynamics: from structure prediction to folding mechanisms. , 2008, RNA.

[22]  Yangyu Huang,et al.  Automated and fast building of three-dimensional RNA structures , 2012, Scientific Reports.

[23]  Jacek Blazewicz,et al.  Automated 3D structure composition for large RNAs , 2012, Nucleic acids research.

[24]  D. Spector,et al.  Long noncoding RNAs: functional surprises from the RNA world. , 2009, Genes & development.

[25]  D. Bartel MicroRNAs: Target Recognition and Regulatory Functions , 2009, Cell.

[26]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[27]  Giovanni Bussi,et al.  Assessing the accuracy of direct-coupling analysis for RNA contact prediction , 2018, RNA.

[28]  Marie Weiel,et al.  Rapid interpretation of small-angle X-ray scattering data , 2019, PLoS Comput. Biol..

[29]  M. Ladomery,et al.  Molecular biology of RNA , 1988, Journal of Cellular Biochemistry.

[30]  Feng Ding,et al.  RNA-Puzzles Round II: assessment of RNA structure prediction programs applied to three large RNA structures , 2015, RNA.

[31]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[32]  F. Major,et al.  The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data , 2008, Nature.

[33]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[34]  Simona Cocco,et al.  Inverse statistical physics of protein sequences: a key issues review , 2017, Reports on progress in physics. Physical Society.

[35]  G. Steger,et al.  Methods for Predicting RNA Secondary Structure , 2012 .

[36]  Mehari B. Zerihun,et al.  Biomolecular coevolution and its applications: Going from structure prediction toward signaling, epistasis, and function. , 2017, Biochemical Society transactions.

[37]  D. Baker,et al.  Automated de novo prediction of native-like RNA tertiary structures , 2007, Proceedings of the National Academy of Sciences.

[38]  Xiaojun Xu,et al.  Predicting RNA Structure with Vfold. , 2017, Methods in molecular biology.

[39]  Simona Cocco,et al.  Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction , 2015, Nucleic acids research.

[40]  J. Steitz,et al.  The Noncoding RNA Revolution—Trashing Old Rules to Forge New Ones , 2014, Cell.

[41]  A. Fox The noncoding RNA revolution. , 2014, The international journal of biochemistry & cell biology.

[42]  John D. Westbrook,et al.  The Nucleic Acid Database: new features and capabilities , 2013, Nucleic Acids Res..

[43]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[44]  Rhiju Das,et al.  Modeling complex RNA tertiary folds with Rosetta. , 2015, Methods in enzymology.

[45]  J. Bujnicki,et al.  SimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction , 2015, Nucleic acids research.

[46]  Jian Wang,et al.  Optimization of RNA 3D structure prediction using evolutionary restraints of nucleotide–nucleotide interactions from direct coupling analysis , 2017, Nucleic acids research.

[47]  E. Aurell,et al.  Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[48]  Fabrizio Pucci,et al.  pydca v1.0: a comprehensive software for Direct Coupling Analysis of RNA and Protein Sequences , 2019, bioRxiv.

[49]  Markus Gruber,et al.  CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations , 2014, Bioinform..

[50]  J. DiStefano,et al.  The Emerging Role of Long Noncoding RNAs in Human Disease. , 2018, Methods in molecular biology.