Forging the Basis for Developing Protein-Ligand Interaction Scoring Functions.

In structure-based drug design, scoring functions are widely used for fast evaluation of protein-ligand interactions. They are often applied in combination with molecular docking and de novo design methods. Since the early 1990s, a whole spectrum of protein-ligand interaction scoring functions have been developed. Regardless of their technical difference, scoring functions all need data sets combining protein-ligand complex structures and binding affinity data for parametrization and validation. However, data sets of this kind used to be rather limited in terms of size and quality. On the other hand, standard metrics for evaluating scoring function used to be ambiguous. Scoring functions are often tested in molecular docking or even virtual screening trials, which do not directly reflect the genuine quality of scoring functions. Collectively, these underlying obstacles have impeded the invention of more advanced scoring functions. In this Account, we describe our long-lasting efforts to overcome these obstacles, which involve two related projects. On the first project, we have created the PDBbind database. It is the first database that systematically annotates the protein-ligand complexes in the Protein Data Bank (PDB) with experimental binding data. This database has been updated annually since its first public release in 2004. The latest release (version 2016) provides binding data for 16 179 biomolecular complexes in PDB. Data sets provided by PDBbind have been applied to many computational and statistical studies on protein-ligand interaction and various subjects. In particular, it has become a major data resource for scoring function development. On the second project, we have established the Comparative Assessment of Scoring Functions (CASF) benchmark for scoring function evaluation. Our key idea is to decouple the "scoring" process from the "sampling" process, so scoring functions can be tested in a relatively pure context to reflect their quality. In our latest work on this track, i.e. CASF-2013, the performance of a scoring function was quantified in four aspects, including "scoring power", "ranking power", "docking power", and "screening power". All four performance tests were conducted on a test set containing 195 high-quality protein-ligand complexes selected from PDBbind. A panel of 20 standard scoring functions were tested as demonstration. Importantly, CASF is designed to be an open-access benchmark, with which scoring functions developed by different researchers can be compared on the same grounds. Indeed, it has become a popular choice for scoring function validation in recent years. Despite the considerable progress that has been made so far, the performance of today's scoring functions still does not meet people's expectations in many aspects. There is a constant demand for more advanced scoring functions. Our efforts have helped to overcome some obstacles underlying scoring function development so that the researchers in this field can move forward faster. We will continue to improve the PDBbind database and the CASF benchmark in the future to keep them as useful community resources.

[1]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[2]  Shaomeng Wang,et al.  An Extensive Test of 14 Scoring Functions Using the PDBbind Refined Set of 800 Protein-Ligand Complexes , 2004, J. Chem. Inf. Model..

[3]  Richard D. Smith,et al.  Recent improvements to Binding MOAD: a resource for protein–ligand binding affinities and structures , 2014, Nucleic Acids Res..

[4]  Xiaoqin Zou,et al.  Scoring functions and their evaluation methods for protein-ligand docking: recent advances and future directions. , 2010, Physical chemistry chemical physics : PCCP.

[5]  Jie Li,et al.  PDB-wide collection of binding data: current status of the PDBbind database , 2015, Bioinform..

[6]  John B. O. Mitchell,et al.  Protein Ligand Database (PLD): additional understanding of the nature and specificity of protein-ligand complexes , 2003, Bioinform..

[7]  A. Leach,et al.  Prediction of Protein—Ligand Interactions. Docking and Scoring: Successes and Gaps , 2006 .

[8]  Gisbert Schneider,et al.  Computer-based de novo design of drug-like molecules , 2005, Nature Reviews Drug Discovery.

[9]  Zhihai Liu,et al.  Test MM-PB/SA on True Conformational Ensembles of Protein-Ligand Complexes , 2010, J. Chem. Inf. Model..

[10]  Zhihai Liu,et al.  Cross‐Mapping of Protein – Ligand Binding Data Between ChEMBL and PDBbind , 2015, Molecular informatics.

[11]  Zhihai Liu,et al.  Mining the Characteristic Interaction Patterns on Protein-Protein Binding Interfaces , 2013, J. Chem. Inf. Model..

[12]  Renxiao Wang,et al.  Comparative evaluation of 11 scoring functions for molecular docking. , 2003, Journal of medicinal chemistry.

[13]  Jie Liu,et al.  Classification of Current Scoring Functions , 2015, J. Chem. Inf. Model..

[14]  D. E. Clark,et al.  Outstanding challenges in protein–ligand docking and structure‐based virtual screening , 2011 .

[15]  Zhihai Liu,et al.  A knowledge-guided strategy for improving the accuracy of scoring functions in binding affinity prediction , 2010, BMC Bioinformatics.

[16]  C L Brooks,et al.  Ligand-protein database: linking protein-ligand complex structures to binding data. , 2001, Journal of medicinal chemistry.

[17]  Nicolas Moitessier,et al.  Docking Ligands into Flexible and Solvated Macromolecules. 4. Are Popular Scoring Functions Accurate for this Class of Proteins? , 2009, J. Chem. Inf. Model..

[18]  J. Bajorath,et al.  Docking and scoring in virtual screening for drug discovery: methods and applications , 2004, Nature Reviews Drug Discovery.

[19]  Pedro J Ballester,et al.  Machine‐learning scoring functions to improve structure‐based binding affinity prediction and virtual screening , 2015, Wiley interdisciplinary reviews. Computational molecular science.

[20]  Zhihai Liu,et al.  Comparative Assessment of Scoring Functions on an Updated Benchmark: 2. Evaluation Methods and General Results , 2014, J. Chem. Inf. Model..

[21]  Gerhard Klebe,et al.  AffinDB: a freely accessible database of affinities for protein–ligand complexes from the PDB , 2005, Nucleic Acids Res..

[22]  Michael M. Mysinger,et al.  Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking , 2012, Journal of medicinal chemistry.

[23]  Heather A. Carlson,et al.  Lessons Learned over Four Benchmark Exercises from the Community Structure-Activity Resource , 2016, Journal of Chemical Information and Modeling.

[24]  Zhan-Ting Li,et al.  Geometrical Preferences of the Hydrogen Bonds on Protein-Ligand Binding Interface Derived from Statistical Surveys and Quantum Mechanics Calculations. , 2008, Journal of chemical theory and computation.

[25]  Jie Li,et al.  A Statistical Survey on the Binding Constants of Covalently Bound Protein–Ligand Complexes , 2010, Molecular informatics.

[26]  Jie Li,et al.  Comparative Assessment of Scoring Functions on an Updated Benchmark: 1. Compilation of the Test Set , 2014, J. Chem. Inf. Model..

[27]  Renxiao Wang,et al.  The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. , 2004, Journal of medicinal chemistry.

[28]  Natasja Brooijmans,et al.  Molecular recognition and docking algorithms. , 2003, Annual review of biophysics and biomolecular structure.

[29]  Zhihai Liu,et al.  Evaluation of the performance of four molecular docking programs on a diverse set of protein‐ligand complexes , 2010, J. Comput. Chem..

[30]  Zhiqiang Yan,et al.  Optimizing the affinity and specificity of ligand binding with the inclusion of solvation effect , 2015, Proteins.

[31]  Zhihai Liu,et al.  Comparative Assessment of Scoring Functions on a Diverse Test Set , 2009, J. Chem. Inf. Model..

[32]  Santosh A. Khedkar,et al.  Successful applications of computer aided drug discovery: moving drugs from concept to the clinic. , 2010, Current topics in medicinal chemistry.

[33]  Gerhard Klebe,et al.  SFCscore: Scoring functions for affinity prediction of protein–ligand complexes , 2008, Proteins.

[34]  Christian Kramer,et al.  Quality Issues with Public Domain Chemogenomics Data , 2013, Molecular informatics.

[35]  Christopher R. Corbeil,et al.  Towards the development of universal, fast and highly accurate docking/scoring methods: a long way to go , 2008, British journal of pharmacology.

[36]  Renxiao Wang,et al.  The PDBbind database: methodologies and updates. , 2005, Journal of medicinal chemistry.

[37]  Luhua Lai,et al.  Further development and validation of empirical scoring functions for structure-based binding affinity prediction , 2002, J. Comput. Aided Mol. Des..

[38]  Matthias Rarey,et al.  Protein–ligand interaction databases: advanced tools to mine activity data and interactions on a structural level , 2014 .