Computational methods and tools for binding site recognition between proteins and small molecules: from classical geometrical approaches to modern machine learning strategies

In the current “genomic era” the number of identified genes is growing exponentially. However, the biological function of a large number of the corresponding proteins is still unknown. Recognition of small molecule ligands (e.g., substrates, inhibitors, allosteric regulators, etc.) is pivotal for protein functions in the vast majority of the cases and knowledge of the region where these processes take place is essential for protein function prediction and drug design. In this regard, computational methods represent essential tools to tackle this problem. A significant number of software tools have been developed in the last few years which exploit either protein sequence information, structure information or both. This review describes the most recent developments in protein function recognition and binding site prediction, in terms of both freely-available and commercial solutions and tools, detailing the main characteristics of the considered tools and providing a comparative analysis of their performance.

[1]  N. Metropolis THE BEGINNING of the MONTE CARLO METHOD , 2022 .

[2]  Jie Li,et al.  PDB-wide collection of binding data: current status of the PDBbind database , 2015, Bioinform..

[3]  Chih Lee,et al.  PCA-based population structure inference with generic clustering algorithms , 2009, BMC Bioinformatics.

[4]  Jeffrey Skolnick,et al.  Implications of the small number of distinct ligand binding pockets in proteins for drug discovery, evolution and biochemical function. , 2015, Bioorganic & medicinal chemistry letters.

[5]  Vincent Le Guilloux,et al.  Fpocket: An open source platform for ligand pocket detection , 2009, BMC Bioinformatics.

[6]  Sarita Rajender Potlapally,et al.  Structure-based identification of potential novel inhibitors targeting FAM3B (PANDER) causing type 2 diabetes mellitus through virtual screening , 2019, Journal of receptor and signal transduction research.

[7]  Shruti Asmita,et al.  Review on the Architecture, Algorithm and Fusion Strategies in Ensemble Learning , 2014 .

[8]  Penny J. Beuning,et al.  Biochemical functional predictions for protein structures of unknown or uncertain function , 2015, Computational and structural biotechnology journal.

[9]  Thomas A. Halgren,et al.  Identifying and Characterizing Binding Sites and Assessing Druggability , 2009, J. Chem. Inf. Model..

[10]  H. Edelsbrunner,et al.  Anatomy of protein pockets and cavities: Measurement of binding site geometry and implications for ligand design , 1998, Protein science : a publication of the Protein Society.

[11]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[12]  Yang Zhang,et al.  COFACTOR: improved protein function prediction by combining structure, sequence and protein–protein interaction information , 2017, Nucleic Acids Res..

[13]  Jeffrey Skolnick,et al.  Are predicted protein structures of any value for binding site prediction and virtual ligand screening? , 2013, Current opinion in structural biology.

[14]  Qi Wu,et al.  COACH-D: improved protein–ligand binding sites prediction with refined ligand-binding poses through molecular docking , 2018, Nucleic Acids Res..

[15]  P. Pardalos,et al.  An exact algorithm for the maximum clique problem , 1990 .

[16]  Gianni De Fabritiis,et al.  DeepSite: protein‐binding site predictor using 3D‐convolutional neural networks , 2017, Bioinform..

[17]  Hamid D. Ismail,et al.  RF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest , 2016, BioMed research international.

[18]  H. Yamana,et al.  SCPSSMpred: A General Sequence-based Method for Ligand-binding Site Prediction , 2013 .

[19]  David Hoksza,et al.  P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure , 2018, Journal of Cheminformatics.

[20]  Daniel B. Roche,et al.  Proteins and Their Interacting Partners: An Introduction to Protein–Ligand Binding Site Prediction Methods , 2015, International journal of molecular sciences.

[21]  Liam J. McGuffin,et al.  The FunFOLD2 server for the prediction of protein–ligand interactions , 2013, Nucleic Acids Res..

[22]  An-Suei Yang,et al.  Predicting Ligand Binding Sites on Protein Surfaces by 3-Dimensional Probability Density Distributions of Interacting Atoms , 2016, PloS one.

[23]  R. Najmanovich Evolutionary studies of ligand binding sites in proteins. , 2017, Current opinion in structural biology.

[24]  Kentaro Shimizu,et al.  Development of a protein–ligand-binding site prediction method based on interaction energy and sequence conservation , 2016, Journal of Structural and Functional Genomics.

[25]  R. Laskowski SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. , 1995, Journal of molecular graphics.

[26]  Akira Saito,et al.  Recent advances in functional region prediction by using structural and evolutionary information – Remaining problems and future extensions , 2013, Computational and structural biotechnology journal.

[27]  Barry Honig,et al.  Structure-based prediction of ligand–protein interactions on a genome-wide scale , 2017, Proceedings of the National Academy of Sciences.

[28]  Nikos Kyrpides,et al.  The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide , 2005, Nucleic Acids Res..

[29]  P. Jayaprakash,et al.  Design of novel PhMTNA inhibitors, targeting neurological disorder through homology modeling, molecular docking, and dynamics approaches , 2019, Journal of receptor and signal transduction research.

[30]  Rushi Longadge,et al.  Class Imbalance Problem in Data Mining Review , 2013, ArXiv.

[31]  Prudence Mutowo-Meullenet,et al.  The GOA database: Gene Ontology annotation updates for 2015 , 2014, Nucleic Acids Res..

[32]  M. Schroeder,et al.  LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation , 2006, BMC Structural Biology.

[33]  T. Kawabata Detection of multiscale pockets on protein surfaces using mathematical morphology , 2010, Proteins.

[34]  Liam J. McGuffin,et al.  FunFOLD: an improved automated method for the prediction of ligand binding residues using 3D models of proteins , 2011, BMC Bioinformatics.

[35]  Daniel R. Caffrey,et al.  Structure-based maximal affinity model predicts small-molecule druggability , 2007, Nature Biotechnology.

[36]  Shoshana J. Wodak,et al.  LigASite—a database of biologically relevant binding sites in proteins with known apo-structures , 2007, Nucleic Acids Res..

[37]  G. Schneider,et al.  PocketPicker: analysis of ligand binding-sites with shape descriptors , 2007, Chemistry Central Journal.

[38]  Kenji Mizuguchi,et al.  Network analysis and in silico prediction of protein-protein interactions with applications in drug discovery. , 2017, Current opinion in structural biology.

[39]  Yang Zhang,et al.  Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment , 2013, Bioinform..

[40]  Torsten Schwede,et al.  Assessment of ligand binding site predictions in CASP10 , 2014, Proteins.

[41]  J. Skolnick,et al.  A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation , 2008, Proceedings of the National Academy of Sciences.

[42]  Liam J. McGuffin,et al.  FunFOLDQA: A Quality Assessment Tool for Protein-Ligand Binding Site Residue Predictions , 2012, PloS one.

[43]  Hongyi Zhou,et al.  FINDSITEcomb: A Threading/Structure-Based, Proteomic-Scale Virtual Ligand Screening Approach , 2013, J. Chem. Inf. Model..

[44]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[45]  B. Honig,et al.  Toward a “Structural BLAST”: Using structural relationships to infer function , 2013, Protein science : a publication of the Protein Society.

[46]  Richard M. Jackson,et al.  Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites , 2005, Bioinform..

[47]  G. Klebe,et al.  A new method to detect related function among proteins independent of sequence and fold homology. , 2002, Journal of molecular biology.

[48]  Maxim Totrov,et al.  Ligand binding site superposition and comparison based on Atomic Property Fields: identification of distant homologues, convergent evolution and PDB-wide clustering of binding sites , 2011, BMC Bioinformatics.

[49]  Subrayal M. Reddy,et al.  Towards Rational Design of Selective Molecularly Imprinted Polymers (MIPs) for Proteins: Computational and Experimental Studies of Acrylamide-Based Polymers for Myoglobin. , 2019, The journal of physical chemistry. B.

[50]  Mona Singh,et al.  Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure , 2009, PLoS Comput. Biol..

[51]  Andreas Windemuth,et al.  Structural coverage of the proteome for pharmaceutical applications. , 2017, Drug discovery today.

[52]  Daniel B. Roche,et al.  Automated tertiary structure prediction with accurate local model quality assessment using the intfold‐ts method , 2011, Proteins.

[53]  Andras Fiser,et al.  Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative , 2014, Proceedings of the National Academy of Sciences.

[54]  Yanlin Zhao,et al.  The Beginning of the rpoB Gene in Addition to the Rifampin Resistance Determination Region Might Be Needed for Identifying Rifampin/Rifabutin Cross-Resistance in Multidrug-Resistant Mycobacterium tuberculosis Isolates from Southern China , 2011, Journal of Clinical Microbiology.

[55]  Yang Zhang,et al.  BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions , 2012, Nucleic Acids Res..

[56]  R. Loisy,et al.  Sur la forme des courbes [voir pdf] , 1951 .

[57]  Gisele L. Pappa,et al.  GASS: identifying enzyme active sites with genetic algorithms , 2015, Bioinform..

[58]  Torsten Schwede,et al.  Assessment of ligand‐binding residue predictions in CASP9 , 2011, Proteins.

[59]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[60]  José Ignacio Garzón,et al.  Template-based prediction of protein function. , 2015, Current opinion in structural biology.

[61]  Mona Singh,et al.  Predicting functionally important residues from sequence conservation , 2007, Bioinform..

[62]  I. Bahar,et al.  Coupling between catalytic site and collective dynamics: a requirement for mechanochemical activity of enzymes. , 2005, Structure.

[63]  Fabio Polticelli,et al.  LIBRA: LIgand Binding site Recognition Application , 2015, Bioinform..

[64]  Janet M. Thornton,et al.  The Catalytic Site Atlas 2.0: cataloging catalytic sites and residues identified in enzymes , 2013, Nucleic Acids Res..

[65]  Xin Gao,et al.  LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone , 2014, BMC Bioinformatics.

[66]  Alasdair T. R. Laurie,et al.  Methods for the prediction of protein-ligand binding sites for structure-based drug design and virtual ligand screening. , 2006, Current protein & peptide science.

[67]  B. Honig,et al.  Structure-based prediction of protein-protein interactions on a genome-wide scale , 2012, Nature.

[68]  Didier Rognan,et al.  sc-PDB: a 3D-database of ligandable binding sites—10 years on , 2014, Nucleic Acids Res..

[69]  David S. Goodsell,et al.  AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility , 2009, J. Comput. Chem..

[70]  Shaojie Qiao,et al.  ENSEMBLE-CNN: Predicting DNA Binding Sites in Protein Sequences by an Ensemble Deep Learning Method , 2018, ICIC.

[71]  B KC Dukka,et al.  Structure-based Methods for Computational Protein Functional Site Prediction , 2013, Computational and structural biotechnology journal.

[72]  Fabio Polticelli,et al.  Protein-ligand binding site detection as an alternative route to molecular docking and drug repurposing , 2018, Bio Algorithms Med Syst..

[73]  Liam J. McGuffin,et al.  IntFOLD: an integrated server for modelling protein structures and functions from amino acid sequences , 2015, Nucleic Acids Res..

[74]  Diego Garrido-Martín,et al.  Effect of the sequence data deluge on the performance of methods for detecting protein functional residues , 2018, BMC Bioinformatics.

[75]  Fabio Polticelli,et al.  ASSIST: a fast versatile local structural comparison tool , 2014, Bioinform..

[76]  Romano T. Kroemer,et al.  Large-Scale Comparison of Four Binding Site Detection Algorithms , 2010, J. Chem. Inf. Model..

[77]  Gisele L. Pappa,et al.  GASS-WEB: a web server for identifying enzyme active sites based on genetic algorithms , 2017, Nucleic Acids Res..

[78]  Yang Zhang,et al.  COFACTOR: an accurate comparative algorithm for structure-based protein function annotation , 2012, Nucleic Acids Res..

[79]  Mark N. Wass,et al.  Convergent evolution of enzyme active sites is not a rare phenomenon. , 2007, Journal of molecular biology.

[80]  M Hendlich,et al.  LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. , 1997, Journal of molecular graphics & modelling.

[81]  Fabio Polticelli,et al.  LIBRA-WA: a web application for ligand binding site detection and protein function recognition , 2018, Bioinform..

[82]  Joaquim A. Jorge,et al.  Multi-GPU-based detection of protein cavities using critical points , 2017, Future Gener. Comput. Syst..

[83]  Bingding Huang,et al.  MetaPocket: a meta approach to improve protein ligand binding site prediction. , 2009, Omics : a journal of integrative biology.

[84]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[85]  Janet M. Thornton,et al.  ProFunc: a server for predicting protein function from 3D structure , 2005, Nucleic Acids Res..

[86]  C. Orengo,et al.  One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions. , 2002, Journal of molecular biology.

[87]  Yong Zhou,et al.  Roll: a new algorithm for the detection of protein pockets and cavities with a rolling probe sphere , 2010, Bioinform..

[88]  Dachuan Zhang,et al.  MMDB and VAST+: tracking structural similarities between macromolecular complexes , 2013, Nucleic Acids Res..

[89]  Sukanta Mondal,et al.  Ensemble Architecture for Prediction of Enzyme‐ligand Binding Residues Using Evolutionary Information , 2017, Molecular informatics.

[90]  Matthew J. O’Meara,et al.  The Recognition of Identical Ligands by Unrelated Proteins. , 2015, ACS chemical biology.

[91]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[92]  Jun Hu,et al.  Designing Template-Free Predictor for Targeting Protein-Ligand Binding Sites with Classifier Ensemble and Spatial Clustering , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[93]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[94]  Michel F. Sanner,et al.  AutoSite: an automated approach for pseudo-ligands prediction - from ligand-binding sites identification to predicting key ligand atoms , 2016, Bioinform..

[95]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[96]  A. Elofsson,et al.  Structure is three to ten times more conserved than sequence—A study of structural response in protein cores , 2009, Proteins.

[97]  Oliver Koch,et al.  A benchmark driven guide to binding site comparison: An exhaustive evaluation using tailor-made data sets (ProSPECCTs) , 2018, PLoS Comput. Biol..

[98]  David P. Dobkin,et al.  The quickhull algorithm for convex hulls , 1996, TOMS.

[99]  Paul N. Mortenson,et al.  Diverse, high-quality test set for the validation of protein-ligand docking performance. , 2007, Journal of medicinal chemistry.

[100]  Barry Honig,et al.  GRASP2: visualization, surface properties, and electrostatics of macromolecular structures and sequences. , 2003, Methods in enzymology.

[101]  Pieter F. W. Stouten,et al.  Fast prediction and visualization of protein binding pockets with PASS , 2000, J. Comput. Aided Mol. Des..