Comprehensive review and empirical analysis of hallmarks of DNA-, RNA- and protein-binding residues in protein chains

Proteins interact with a variety of molecules including proteins and nucleic acids. We review a comprehensive collection of over 50 studies that analyze and/or predict these interactions. While majority of these studies address either solely protein-DNA or protein-RNA binding, only a few have a wider scope that covers both protein-protein and protein-nucleic acid binding. Our analysis reveals that binding residues are typically characterized with three hallmarks: relative solvent accessibility (RSA), evolutionary conservation and propensity of amino acids (AAs) for binding. Motivated by drawbacks of the prior studies, we perform a large-scale analysis to quantify and contrast the three hallmarks for residues that bind DNA-, RNA-, protein- and (for the first time) multi-ligand-binding residues that interact with DNA and proteins, and with RNA and proteins. Results generated on a well-annotated data set of over 23 000 proteins show that conservation of binding residues is higher for nucleic acid- than protein-binding residues. Multi-ligand-binding residues are more conserved and have higher RSA than single-ligand-binding residues. We empirically show that each hallmark discriminates between binding and nonbinding residues, even predicted RSA, and that combining them improves discriminatory power for each of the five types of interactions. Linear scoring functions that combine these hallmarks offer good predictive performance of residue-level propensity for binding and provide intuitive interpretation of predictions. Better understanding of these residue-level interactions will facilitate development of methods that accurately predict binding in the exponentially growing databases of protein sequences.

[1]  B Jayaram,et al.  Sequence and structural features of binding site residues in protein-protein complexes: comparison with protein-nucleic acid complexes , 2011, Proteome Science.

[2]  Jun Hu,et al.  Designing Template-Free Predictor for Targeting Protein-Ligand Binding Sites with Classifier Ensemble and Spatial Clustering , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Ram Samudrala,et al.  A protein sequence meta-functional signature for calcium binding residue prediction , 2010, Pattern Recognit. Lett..

[4]  Yang Zhang,et al.  BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions , 2012, Nucleic Acids Res..

[5]  Saraswathi Vishveshwara,et al.  Insights into Protein–DNA Interactions through Structure Network Analysis , 2008, PLoS Comput. Biol..

[6]  M. Natália D. S. Cordeiro,et al.  Solvent Accessible Surface Area-Based Hot-Spot Detection Methods for Protein-Protein and Protein-Nucleic Acid Interfaces , 2015, J. Chem. Inf. Model..

[7]  Anne-Claude Camproux,et al.  Deciphering the shape and deformation of secondary structures through local conformation analysis , 2011, BMC Structural Biology.

[8]  Jun Hu,et al.  Enhancing protein-vitamin binding residues prediction by multiple heterogeneous subspace SVMs ensemble , 2014, BMC Bioinformatics.

[9]  Ke Chen,et al.  Investigation of Atomic Level Patterns in Protein—Small Ligand Interactions , 2009, PloS one.

[10]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[11]  Lukasz Kurgan,et al.  Covering complete proteomes with X-ray structures: a current snapshot , 2014, Acta crystallographica. Section D, Biological crystallography.

[12]  Ponraj Prabakaran,et al.  Classification of protein-DNA complexes based on structural descriptors. , 2006, Structure.

[13]  Jing-Yu Yang,et al.  A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction , 2014, PloS one.

[14]  Meng-long Li,et al.  Identification of RNA-binding sites in proteins by integrating various sequence information , 2010, Amino Acids.

[15]  Lukasz A. Kurgan,et al.  Review and comparative assessment of sequence‐based predictors of protein‐binding residues , 2018, Briefings Bioinform..

[16]  Daniel R. Caffrey,et al.  Are protein–protein interfaces more conserved in sequence than the rest of the protein surface? , 2004, Protein science : a publication of the Protein Society.

[17]  Tuo Zhang,et al.  Analysis and prediction of RNA-binding residues using sequence, evolutionary conservation, and predicted secondary structure and solvent accessibility. , 2010, Current protein & peptide science.

[18]  Lukasz A. Kurgan,et al.  A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues , 2016, Briefings Bioinform..

[19]  Wei Wang,et al.  Analysis and classification of DNA-binding sites in single-stranded and double-stranded DNA-binding proteins using protein information. , 2014, IET systems biology.

[20]  R. Gordân,et al.  Protein–DNA binding: complexities and multi-protein codes , 2013, Nucleic acids research.

[21]  Jun Hu,et al.  TargetATPsite: A template‐free method for ATP‐binding sites prediction with residue evolution image sparse representation and classifier ensemble , 2013, J. Comput. Chem..

[22]  Eric A. Ortlund,et al.  The structure, function and evolution of proteins that bind DNA and RNA , 2014, Nature Reviews Molecular Cell Biology.

[23]  Masakazu Sekijima,et al.  Structure based approach for understanding organism specific recognition of protein-RNA complexes , 2015, Biology Direct.

[24]  Michal Brylinski,et al.  Template-based identification of protein-protein interfaces using eFindSitePPI. , 2016, Methods.

[25]  R. Nussinov,et al.  Protein–protein interactions: Structurally conserved residues distinguish between binding sites and exposed protein surfaces , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Paolo Frasconi,et al.  Predicting Metal-Binding Sites from Protein Sequence , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  Kaustubh D. Dhole,et al.  SPRINGS: Prediction of Protein- Protein Interaction Sites Using Artificial Neural Networks , 2014 .

[28]  Shandar Ahmad,et al.  PSSM-based prediction of DNA binding sites in proteins , 2005, BMC Bioinformatics.

[29]  Vladimir Vacic,et al.  Composition Profiler: a tool for discovery and visualization of amino acid composition differences , 2007, BMC Bioinformatics.

[30]  Jun Hu,et al.  Constructing Query-Driven Dynamic Machine Learning Model With Application to Protein-Ligand Binding Sites Prediction , 2015, IEEE Transactions on NanoBioscience.

[31]  G. Grant,et al.  Role of aromatic amino acids in protein-nucleic acid recognition. , 2007, Biopolymers.

[32]  J. Janin,et al.  Dissecting protein–RNA recognition sites , 2008, Nucleic acids research.

[33]  Ruth Nussinov,et al.  An overview of recent advances in structural bioinformatics of protein-protein interactions and a guide to their principles. , 2014, Progress in biophysics and molecular biology.

[34]  Kenji Mizuguchi,et al.  Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites , 2010, Bioinform..

[35]  Wen-Lian Hsu,et al.  Predicting RNA-binding sites of proteins using support vector machines and evolutionary information , 2008, BMC Bioinformatics.

[36]  Lukasz A. Kurgan,et al.  DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences , 2016, Bioinform..

[37]  Rasna R. Walia,et al.  RNABindRPlus: A Predictor that Combines Machine Learning and Sequence Homology-Based Methods to Improve the Reliability of Predicted RNA-Binding Residues in Proteins , 2014, PloS one.

[38]  Daniel B. Roche,et al.  Proteins and Their Interacting Partners: An Introduction to Protein–Ligand Binding Site Prediction Methods , 2015, International journal of molecular sciences.

[39]  Rui Zhao,et al.  An Overview of the Prediction of Protein DNA-Binding Sites , 2015, International journal of molecular sciences.

[40]  Junfeng Huang,et al.  metaPIS: A Sequence-based Meta-server for Protein Interaction Site Prediction , 2012 .

[41]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[42]  Yuedong Yang,et al.  Prediction of RNA binding proteins comes of age from low resolution to high resolution. , 2013, Molecular bioSystems.

[43]  Pinak Chakrabarti,et al.  Characterization and prediction of the binding site in DNA-binding proteins: improvement of accuracy by combining residue composition, evolutionary conservation and structural parameters , 2012, Nucleic acids research.

[44]  Abhishek Mishra,et al.  PRince: a web server for structural and physicochemical analysis of Protein-RNA interface , 2012, Nucleic Acids Res..

[45]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[46]  Ozlem Keskin,et al.  Protein–DNA interactions: structural, thermodynamic and clustering patterns of conserved residues in DNA-binding proteins , 2008, Nucleic acids research.

[47]  Andras Fiser,et al.  Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative , 2014, Proceedings of the National Academy of Sciences.

[48]  B. Honig,et al.  A hybrid method for protein–protein interface prediction , 2016, Protein science : a publication of the Protein Society.

[49]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[50]  M. Gribskov,et al.  The role of RNA sequence and structure in RNA--protein interactions. , 2011, Journal of molecular biology.

[51]  Xiaoqi Zheng,et al.  Prediction of catalytic residues based on an overlapping amino acid classification , 2010, Amino Acids.

[52]  Daron M. Standley,et al.  Quantifying sequence and structural features of protein–RNA interactions , 2014, Nucleic acids research.

[53]  Kyungsook Han,et al.  Prediction of RNA-binding amino acids from protein and RNA sequences , 2011, BMC Bioinformatics.

[54]  Jae-Hyung Lee,et al.  RNABindR: a server for analyzing and predicting RNA-binding sites in proteins , 2007, Nucleic Acids Res..

[55]  Hong Yan,et al.  Identification of DNA-Binding and Protein-Binding Proteins Using Enhanced Graph Wavelet Features , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[56]  Katie A. Wilson,et al.  DNA–protein π-interactions in nature: abundance, structure, composition and strength of contacts between aromatic amino acids and DNA nucleobases or deoxyribose sugar , 2014, Nucleic acids research.

[57]  Austin G. Meyer,et al.  Maximum Allowed Solvent Accessibilites of Residues in Proteins , 2012, PloS one.

[58]  Alessandra Carbone,et al.  Local Geometry and Evolutionary Conservation of Protein Surfaces Reveal the Multiple Recognition Patches in Protein-Protein Interactions , 2015, PLoS Comput. Biol..

[59]  Johannes Söding,et al.  Prediction of protein functional residues from sequence by probability density estimation , 2008, Bioinform..

[60]  Lukasz A. Kurgan,et al.  Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors , 2012, Bioinform..

[61]  Nir London,et al.  The structural basis of peptide-protein binding strategies. , 2010, Structure.

[62]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[63]  R. Nagarajan,et al.  Novel approach for selecting the best predictor for identifying the binding sites in DNA binding proteins , 2013, Nucleic acids research.

[64]  Haruki Nakamura,et al.  The Protein Data Bank at 40: reflecting on the past to prepare for the future. , 2012, Structure.

[65]  Peer Bork,et al.  Structural analysis of protein‐ligand interactions: the binding of endogenous compounds and of synthetic drugs , 2014, Journal of molecular recognition : JMR.

[66]  J. Bujnicki,et al.  Computational methods for prediction of protein-RNA interactions. , 2012, Journal of structural biology.

[67]  Jonathan J. Ellis,et al.  Protein–RNA interactions: Structural analysis and functional classes , 2006, Proteins.

[68]  Hong-Bin Shen,et al.  Prediction of Protein–Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures , 2015, The Journal of Membrane Biology.

[69]  Gajendra P. S. Raghava,et al.  Prediction of vitamin interacting residues in a vitamin binding protein using evolutionary information , 2013, BMC Bioinformatics.

[70]  Michal Brylinski,et al.  Prediction of protein–protein interaction sites from weakly homologous template structures using meta‐threading and machine learning , 2015, Journal of molecular recognition : JMR.

[71]  Vasant Honavar,et al.  Predicting RNA-Protein Interactions Using Only Sequence Information , 2011, BMC Bioinformatics.

[72]  Timothy R. Hughes,et al.  High-throughput characterization of protein–RNA interactions , 2014, Briefings in functional genomics.

[73]  M. Michael Gromiha,et al.  Scoring Function Based Approach for Locating Binding Sites and Understanding Recognition Mechanism of Protein-DNA Complexes , 2011, J. Chem. Inf. Model..

[74]  Jack Y. Yang,et al.  BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features , 2010, BMC Systems Biology.

[75]  Zhi-Ping Liu,et al.  Prediction of protein-RNA binding sites by a random forest method with combined features , 2010, Bioinform..

[76]  Xia Wang,et al.  Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors , 2016, BMC Bioinformatics.

[77]  Stephan Waack,et al.  A Novel Sequence-Based Feature for the Identification of DNA-Binding Sites in Proteins Using Jensen-Shannon Divergence , 2016, Entropy.

[78]  Michael B Yaffe,et al.  Computational prediction of protein-protein interactions. , 2015, Methods in molecular biology.

[79]  Ying Shen,et al.  RNA-binding residues prediction using structural features , 2015, BMC Bioinformatics.

[80]  Jon D. Wright,et al.  Identifying RNA-binding residues based on evolutionary conserved structural and energetic features , 2013, Nucleic acids research.

[81]  Yu-Yuan Hsiao,et al.  Aromatic residues in RNase T stack with nucleobases to guide the sequence‐specific recognition and cleavage of nucleic acids , 2015, Protein science : a publication of the Protein Society.

[82]  Lukasz Kurgan,et al.  High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder , 2015, Nucleic acids research.

[83]  Parviz Abdolmaleki,et al.  Predictions of Protein-Protein Interfaces within Membrane Protein Complexes , 2013, Avicenna journal of medical biotechnology.

[84]  D. F. Waugh,et al.  Protein-protein interactions. , 1954, Advances in protein chemistry.

[85]  B. Rost,et al.  Analysing six types of protein-protein interfaces. , 2003, Journal of molecular biology.

[86]  Xiao Sun,et al.  Sequence-Based Prediction of DNA-Binding Residues in Proteins with Conservation and Correlation Information , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[87]  P. Holland,et al.  Robust regression using iteratively reweighted least-squares , 1977 .

[88]  Janet M Thornton,et al.  Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity. , 2002, Journal of molecular biology.

[89]  Y. Wang,et al.  PRINTR: Prediction of RNA binding sites in proteins using SVM and profiles , 2008, Amino Acids.

[90]  Shandar Ahmad,et al.  Prediction of dinucleotide-specific RNA-binding sites in proteins , 2011, BMC Bioinformatics.

[91]  Rong Liu,et al.  SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues , 2015, PloS one.

[92]  Yulan He,et al.  PDNAsite: Identification of DNA-binding Site from Protein Sequence by Incorporating Spatial and Sequence Context , 2016, Scientific Reports.

[93]  Yang Li,et al.  Predicting Protein-DNA Binding Residues by Weightedly Combining Sequence-Based Features and Boosting Multiple SVMs , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[94]  L. S. Swapna,et al.  Weak conservation of structural features in the interfaces of homologous transient protein–protein complexes , 2015, Protein science : a publication of the Protein Society.

[95]  Lukasz Kurgan,et al.  DRNApred, fast sequence-based method that accurately predicts and discriminates DNA- and RNA-binding residues , 2017, Nucleic acids research.

[96]  Igor B. Kuznetsov,et al.  DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins , 2007, Bioinform..

[97]  C. Lim,et al.  Competition among metal ions for protein binding sites: determinants of metal ion selectivity in proteins. , 2014, Chemical reviews.

[98]  N. Kannan,et al.  Analysis of homodimeric protein interfaces by graph-spectral methods. , 2002, Protein engineering.

[99]  H. Wolfson,et al.  Protein-Protein Interactions: Coupling of Structurally Conserved Residues and of Hot Spots across Interfaces. Implications for Docking , 2004 .

[100]  R. Nussinov,et al.  Conservation of polar residues as hot spots at protein interfaces , 2000, Proteins.

[101]  Gajendra P.S. Raghava,et al.  Prediction of RNA binding sites in a protein using SVM and PSSM profile , 2008, Proteins.

[102]  L. Perez-Cano,et al.  Optimal protein‐RNA area, OPRA: A propensity‐based method to identify RNA‐binding sites on proteins , 2010, Proteins.

[103]  Yuedong Yang,et al.  Predicting DNA-Binding Proteins and Binding Residues by Complex Structure Prediction and Application to Human Proteome , 2014, PloS one.

[104]  D. Lejeune,et al.  Protein–nucleic acid recognition: Statistical analysis of atomic interactions and influence of DNA structure , 2005, Proteins.

[105]  Jing-Yu Yang,et al.  Protein-protein interaction sites prediction by ensembling SVM and sample-weighted random forests , 2016, Neurocomputing.

[106]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[107]  B. Honig,et al.  Structure-based prediction of protein-protein interactions on a genome-wide scale , 2012, Nature.

[108]  H. Hang,et al.  Turning the spotlight on protein-lipid interactions in cells. , 2014, Current opinion in chemical biology.

[109]  Yaoqi Zhou,et al.  Accurate single‐sequence prediction of solvent accessible surface area using local and global features , 2014, Proteins.

[110]  Jianjun Hu,et al.  DNABind: A hybrid algorithm for structure‐based prediction of DNA‐binding residues by combining machine learning‐ and template‐based approaches , 2013, Proteins.

[111]  Maria Jesus Martin,et al.  SIFTS: Structure Integration with Function, Taxonomy and Sequences resource , 2012, Nucleic Acids Res..

[112]  Xiaobo Zhou,et al.  RPI-Bind: a structure-based method for accurate identification of RNA-protein binding sites , 2017, Scientific Reports.