A Large-Scale Assessment of Nucleic Acids Binding Site Prediction Programs

Computational prediction of nucleic acid binding sites in proteins are necessary to disentangle functional mechanisms in most biological processes and to explore the binding mechanisms. Several strategies have been proposed, but the state-of-the-art approaches display a great diversity in i) the definition of nucleic acid binding sites; ii) the training and test datasets; iii) the algorithmic methods for the prediction strategies; iv) the performance measures and v) the distribution and availability of the prediction programs. Here we report a large-scale assessment of 19 web servers and 3 stand-alone programs on 41 datasets including more than 5000 proteins derived from 3D structures of protein-nucleic acid complexes. Well-defined binary assessment criteria (specificity, sensitivity, precision, accuracy…) are applied. We found that i) the tools have been greatly improved over the years; ii) some of the approaches suffer from theoretical defects and there is still room for sorting out the essential mechanisms of binding; iii) RNA binding and DNA binding appear to follow similar driving forces and iv) dataset bias may exist in some methods.

[1]  Peng Jiang,et al.  RISP: A web-based server for prediction of RNA-binding sites in proteins , 2008, Comput. Methods Programs Biomed..

[2]  Igor B. Kuznetsov,et al.  DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins , 2007, Bioinform..

[3]  Harianto Tjong,et al.  DISPLAR: an accurate method for predicting DNA-binding sites on protein surfaces , 2007, Nucleic acids research.

[4]  Federico Agostini,et al.  Predictions of protein–RNA interactions , 2013 .

[5]  N. Bhardwaj,et al.  Kernel-based machine learning protocol for predicting DNA-binding proteins , 2005, Nucleic acids research.

[6]  Shandar Ahmad,et al.  PSSM-based prediction of DNA binding sites in proteins , 2005, BMC Bioinformatics.

[7]  Zhi-Ping Liu,et al.  Prediction of protein-RNA binding sites by a random forest method with combined features , 2010, Bioinform..

[8]  N. Go,et al.  Amino acid residue doublet propensity in the protein–RNA interface and its application to RNA interface prediction , 2006, Nucleic acids research.

[9]  Jae-Hyung Lee,et al.  RNABindR: a server for analyzing and predicting RNA-binding sites in proteins , 2007, Nucleic Acids Res..

[10]  Kengo Kinoshita,et al.  PreDs: a server for predicting dsDNA-binding site on protein molecular surfaces , 2005, Bioinform..

[11]  Yen-Jen Oyang,et al.  DNA-binding residues and binding mode prediction with binding-mechanism concerned models , 2009, BMC Genomics.

[12]  Jack Y. Yang,et al.  BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features , 2010, BMC Systems Biology.

[13]  Xin Ma,et al.  Prediction of RNA‐binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature , 2011, Proteins.

[14]  M. Gromiha,et al.  Computational approaches for predicting the binding sites and understanding the recognition mechanism of protein-DNA complexes. , 2013, Advances in protein chemistry and structural biology.

[15]  Qian-Zhong Li,et al.  Annotating the protein-RNA interaction sites in proteins using evolutionary information and protein backbone structure. , 2012, Journal of theoretical biology.

[16]  Klaus Schulten,et al.  Challenges in protein-folding simulations , 2010 .

[17]  Seren Soner,et al.  DNABINDPROT: fluctuation-based predictor of DNA-binding residues within a network of interacting residues , 2010, Nucleic Acids Res..

[18]  Carmay Lim,et al.  Predicting DNA‐binding amino acid residues from electrostatic stabilization upon mutation to Asp/Glu and evolutionary conservation , 2007, Proteins.

[19]  Gajendra P.S. Raghava,et al.  Prediction of RNA binding sites in a protein using SVM and PSSM profile , 2008, Proteins.

[20]  Yuedong Yang,et al.  Prediction of RNA binding proteins comes of age from low resolution to high resolution. , 2013, Molecular bioSystems.

[21]  Yi Xiong,et al.  An accurate feature‐based method for identifying DNA‐binding residues on protein surfaces , 2011, Proteins.

[22]  Hui Lu,et al.  NAPS: a residue-level nucleic acid-binding prediction server , 2010, Nucleic Acids Res..

[23]  Y. Shamoo,et al.  Structure-based analysis of protein-RNA interactions using the program ENTANGLE. , 2001, Journal of molecular biology.

[24]  Vasant Honavar,et al.  Struct-NB: predicting protein-RNA binding sites using structural features , 2010, Int. J. Data Min. Bioinform..

[25]  N. Bhardwaj,et al.  Residue‐level prediction of DNA‐binding sites and its application on DNA‐binding protein predictions , 2007, FEBS letters.

[26]  Kyungsook Han,et al.  Prediction of RNA-binding amino acids from protein and RNA sequences , 2011, BMC Bioinformatics.

[27]  Yaoqi Zhou,et al.  Structure-based prediction of RNA-binding domains and RNA-binding sites and application to structural genomics targets , 2010, Nucleic acids research.

[28]  Byungkyu Brian Park,et al.  Sequence-based prediction of protein-binding sites in DNA: Comparative study of two SVM models , 2014, Comput. Methods Programs Biomed..

[29]  Yu-Dong Cai,et al.  Predicting DNA-binding sites of proteins based on sequential and 3D structural information , 2014, Molecular Genetics and Genomics.

[30]  Feng Ding,et al.  RNA-Puzzles: a CASP-like evaluation of RNA three-dimensional structure prediction. , 2012, RNA.

[31]  Michael Schroeder,et al.  MetaDBSite: a meta approach to improve protein DNA-binding sites prediction , 2011, BMC Systems Biology.

[32]  Zhichao Miao,et al.  Prediction of nucleic acid binding probability in proteins: a neighboring residue network based score , 2015, Nucleic acids research.

[33]  Jiang-Ming Sun,et al.  Shape string: a new feature for prediction of DNA-binding residues. , 2013, Biochimie.

[34]  Tao Li,et al.  PreDNA: accurate prediction of DNA-binding sites in proteins by integrating sequence and geometric structure information , 2013, Bioinform..

[35]  Feng Ding,et al.  RNA-Puzzles Round II: assessment of RNA structure prediction programs applied to three large RNA structures , 2015, RNA.

[36]  Dapeng Xiong,et al.  RBRIdent: An algorithm for improved identification of RNA‐binding residues in proteins from primary sequences , 2015, Proteins.

[37]  Ruth Nussinov,et al.  Prediction of interacting single-stranded RNA bases by protein-binding patterns. , 2008, Journal of molecular biology.

[38]  Vasant Honavar,et al.  Predicting DNA-binding sites of proteins from amino acid sequence , 2006, BMC Bioinformatics.

[39]  S. Bryant,et al.  Critical assessment of methods of protein structure prediction (CASP): Round II , 1997, Proteins.

[40]  Lukasz A. Kurgan,et al.  A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues , 2016, Briefings Bioinform..

[41]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[42]  Liangjiang Wang,et al.  BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences , 2006, Nucleic Acids Res..

[43]  Meng-long Li,et al.  Identification of RNA-binding sites in proteins by integrating various sequence information , 2010, Amino Acids.

[44]  Lukasz Kurgan,et al.  High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder , 2015, Nucleic acids research.

[45]  L. Perez-Cano,et al.  Optimal protein‐RNA area, OPRA: A propensity‐based method to identify RNA‐binding sites on proteins , 2010, Proteins.

[46]  Wen-Lian Hsu,et al.  Predicting RNA-binding sites of proteins using support vector machines and evolutionary information , 2008, BMC Bioinformatics.

[47]  Burkhard Rost,et al.  Prediction of DNA-binding residues from sequence , 2007, ISMB/ECCB.

[48]  Junchi Yan,et al.  Predicting protein-RNA interaction amino acids using random forest based on submodularity subset selection , 2014, Comput. Biol. Chem..

[49]  Nicholas M. Luscombe,et al.  Amino acid?base interactions: a three-dimensional analysis of protein?DNA interactions at an atomic level , 2001, Nucleic Acids Res..

[50]  Janet M Thornton,et al.  Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. , 2003, Nucleic acids research.

[51]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[52]  Rong Liu,et al.  RBRDetector: Improved prediction of binding residues on RNA‐binding protein structures using complementary feature‐ and template‐based strategies , 2014, Proteins.

[53]  Shandar Ahmad,et al.  Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information , 2004, Bioinform..

[54]  Kenji Mizuguchi,et al.  Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networks , 2009, BMC Structural Biology.

[55]  Shandar Ahmad,et al.  Prediction of dinucleotide-specific RNA-binding sites in proteins , 2011, BMC Bioinformatics.

[56]  The difficulty of a fair comparison , 2015, Nature Methods.

[57]  J. Bujnicki,et al.  Computational methods for prediction of protein-RNA interactions. , 2012, Journal of structural biology.

[58]  H. Kono,et al.  Structure‐based prediction of DNA target sites by regulatory proteins , 1999, Proteins.

[59]  Rasna R. Walia,et al.  RNABindRPlus: A Predictor that Combines Machine Learning and Sequence Homology-Based Methods to Improve the Reliability of Predicted RNA-Binding Residues in Proteins , 2014, PloS one.

[60]  Alexander McPherson,et al.  Advances in Protein Chemistry and Structural Biology , 2010, Advances in Protein Chemistry and Structural Biology.

[61]  Pinak Chakrabarti,et al.  Characterization and prediction of the binding site in DNA-binding proteins: improvement of accuracy by combining residue composition, evolutionary conservation and structural parameters , 2012, Nucleic acids research.

[62]  Y. Wang,et al.  PRINTR: Prediction of RNA binding sites in proteins using SVM and profiles , 2008, Amino Acids.

[63]  Jeffrey Skolnick,et al.  DBD-Hunter: a knowledge-based method for the prediction of DNA–protein interactions , 2008, Nucleic acids research.

[64]  Haruki Nakamura,et al.  PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequences , 2010, Nucleic Acids Res..

[65]  Zheng Yuan,et al.  Exploiting structural and topological information to improve prediction of RNA-protein binding sites , 2009, BMC Bioinformatics.

[66]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP) — round x , 2014, Proteins.

[67]  Yen-Jen Oyang,et al.  ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors , 2009, Nucleic Acids Res..

[68]  R. Graham,et al.  Predicting RNA-binding sites from the protein structure based on electrostatics, evolution and geometry , 2008, Nucleic acids research.

[69]  Jianjun Hu,et al.  DNABind: A hybrid algorithm for structure‐based prediction of DNA‐binding residues by combining machine learning‐ and template‐based approaches , 2013, Proteins.

[70]  J. Thornton,et al.  Satisfying hydrogen bonding potential in proteins. , 1994, Journal of molecular biology.

[71]  Daron M. Standley,et al.  Quantifying sequence and structural features of protein–RNA interactions , 2014, Nucleic acids research.

[72]  Xiao Sun,et al.  Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature , 2008, Bioinform..

[73]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..