Robust Scoring Functions for Protein-Ligand Interactions with Quantum Chemical Charge Models

Ordinary least-squares (OLS) regression has been used widely for constructing the scoring functions for protein-ligand interactions. However, OLS is very sensitive to the existence of outliers, and models constructed using it are easily affected by the outliers or even the choice of the data set. On the other hand, determination of atomic charges is regarded as of central importance, because the electrostatic interaction is known to be a key contributing factor for biomolecular association. In the development of the AutoDock4 scoring function, only OLS was conducted, and the simple Gasteiger method was adopted. It is therefore of considerable interest to see whether more rigorous charge models could improve the statistical performance of the AutoDock4 scoring function. In this study, we have employed two well-established quantum chemical approaches, namely the restrained electrostatic potential (RESP) and the Austin-model 1-bond charge correction (AM1-BCC) methods, to obtain atomic partial charges, and we have compared how different charge models affect the performance of AutoDock4 scoring functions. In combination with robust regression analysis and outlier exclusion, our new protein-ligand free energy regression model with AM1-BCC charges for ligands and Amber99SB charges for proteins achieve lowest root-mean-squared error of 1.637 kcal/mol for the training set of 147 complexes and 2.176 kcal/mol for the external test set of 1427 complexes. The assessment for binding pose prediction with the 100 external decoy sets indicates very high success rate of 87% with the criteria of predicted root-mean-squared deviation of less than 2 Å. The success rates and statistical performance of our robust scoring functions are only weakly class-dependent (hydrophobic, hydrophilic, or mixed).

[1]  Art E. Cho,et al.  Extension of QM/MM docking and its applications to metalloproteins , 2009, J. Comput. Chem..

[2]  C. Sander,et al.  An effective solvation term based on atomic occupancies for use in protein simulations , 1993 .

[3]  P. Kollman,et al.  A well-behaved electrostatic potential-based method using charge restraints for deriving atomic char , 1993 .

[4]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[5]  Natasja Brooijmans,et al.  Molecular recognition and docking algorithms. , 2003, Annual review of biophysics and biomolecular structure.

[6]  Jung-Hsin Lin,et al.  The relaxed complex method: Accommodating receptor flexibility for drug design with an improved scoring scheme. , 2003, Biopolymers.

[7]  Hans-Joachim Böhm,et al.  The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure , 1994, J. Comput. Aided Mol. Des..

[8]  Renxiao Wang,et al.  Comparative evaluation of 11 scoring functions for molecular docking. , 2003, Journal of medicinal chemistry.

[9]  Luhua Lai,et al.  Further development and validation of empirical scoring functions for structure-based binding affinity prediction , 2002, J. Comput. Aided Mol. Des..

[10]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[11]  Renxiao Wang,et al.  The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. , 2004, Journal of medicinal chemistry.

[12]  Jung-Hsin Lin Accommodating protein flexibility for structure-based drug design. , 2011, Current topics in medicinal chemistry.

[13]  J. Gasteiger,et al.  ITERATIVE PARTIAL EQUALIZATION OF ORBITAL ELECTRONEGATIVITY – A RAPID ACCESS TO ATOMIC CHARGES , 1980 .

[14]  Araz Jakalian,et al.  Fast, efficient generation of high‐quality atomic charges. AM1‐BCC model: I. Method , 2000 .

[15]  U. Singh,et al.  A NEW FORCE FIELD FOR MOLECULAR MECHANICAL SIMULATION OF NUCLEIC ACIDS AND PROTEINS , 1984 .

[16]  Minyong Li,et al.  The effect of different electrostatic potentials on docking accuracy: a case study using DOCK5.4. , 2008, Bioorganic & medicinal chemistry letters.

[17]  Wei Zhang,et al.  A point‐charge force field for molecular mechanics simulations of proteins based on condensed‐phase quantum mechanical calculations , 2003, J. Comput. Chem..

[18]  John W. Tukey,et al.  Graphical Displays for Alternate Regression Fits , 1991 .

[19]  Ming-Jing Hwang,et al.  An interaction-motif-based scoring function for protein-ligand docking , 2010, BMC Bioinformatics.

[20]  Egon L. Willighagen,et al.  The Blue Obelisk—Interoperability in Chemical Informatics , 2006, J. Chem. Inf. Model..

[21]  Anton J. Hopfinger,et al.  Application of Genetic Function Approximation to Quantitative Structure-Activity Relationships and Quantitative Structure-Property Relationships , 1994, J. Chem. Inf. Comput. Sci..

[22]  M. Gilson,et al.  The statistical-thermodynamic basis for computation of binding affinities: a critical review. , 1997, Biophysical journal.

[23]  Christopher I. Bayly,et al.  Fast, efficient generation of high‐quality atomic charges. AM1‐BCC model: II. Parameterization and validation , 2002, J. Comput. Chem..

[24]  R. M. Muir,et al.  Correlation of Biological Activity of Phenoxyacetic Acids with Hammett Substituent Constants and Partition Coefficients , 1962, Nature.

[25]  G. V. Paolini,et al.  Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes , 1997, J. Comput. Aided Mol. Des..

[26]  G. Klebe,et al.  DrugScore(CSD)-knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. , 2005, Journal of medicinal chemistry.

[27]  J. Mccammon,et al.  Computational drug design accommodating receptor flexibility: the relaxed complex scheme. , 2002, Journal of the American Chemical Society.

[28]  David S. Goodsell,et al.  A semiempirical free energy force field with charge‐based desolvation , 2007, J. Comput. Chem..

[29]  Yvan Vander Heyden,et al.  Robust Cross-Validation of Linear Regression QSAR Models , 2008, J. Chem. Inf. Model..

[30]  AKIFUMI ODA,et al.  New AMBER force field parameters of heme iron for cytochrome P450s determined by quantum chemical calculations of simplified models , 2005, J. Comput. Chem..

[31]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[32]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998 .

[33]  Gerhard Klebe,et al.  SFCscore: Scoring functions for affinity prediction of protein–ligand complexes , 2008, Proteins.

[34]  Zsolt Bikádi,et al.  Application of the PM6 semi-empirical method to modeling proteins enhances docking accuracy of AutoDock , 2009, J. Cheminformatics.

[35]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[36]  K. Merz,et al.  Large-scale validation of a quantum mechanics based scoring function: predicting the binding affinity and the binding mode of a diverse set of protein-ligand complexes. , 2005, Journal of medicinal chemistry.

[37]  Zhihai Liu,et al.  Comparative Assessment of Scoring Functions on a Diverse Test Set , 2009, J. Chem. Inf. Model..

[38]  J. Pople,et al.  Self—Consistent Molecular Orbital Methods. XII. Further Extensions of Gaussian—Type Basis Sets for Use in Molecular Orbital Studies of Organic Molecules , 1972 .

[39]  M. Gilson,et al.  Calculation of protein-ligand binding affinities. , 2007, Annual review of biophysics and biomolecular structure.

[40]  Douglas M. Hawkins,et al.  Deterministic fallacies and model validation , 2010 .

[41]  P. Kollman,et al.  A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules , 1995 .

[42]  Peter Gedeck,et al.  Global Free Energy Scoring Functions Based on Distance-Dependent Atom-Type Pair Descriptors , 2011, J. Chem. Inf. Model..

[43]  Renxiao Wang,et al.  The PDBbind database: methodologies and updates. , 2005, Journal of medicinal chemistry.

[44]  Victor Guallar,et al.  Importance of accurate charges in molecular docking: Quantum mechanical/molecular mechanical (QM/MM) approach , 2005, J. Comput. Chem..

[45]  D. Eisenberg,et al.  Atomic solvation parameters applied to molecular dynamics of proteins in solution , 1992, Protein science : a publication of the Protein Society.

[46]  Sourav Das,et al.  Binding Affinity Prediction with Property-Encoded Shape Distribution Signatures , 2010, J. Chem. Inf. Model..

[47]  J. Ponder,et al.  Force fields for protein simulations. , 2003, Advances in protein chemistry.

[48]  C L Brooks,et al.  Ligand-protein database: linking protein-ligand complex structures to binding data. , 2001, Journal of medicinal chemistry.

[49]  Zaida Luthey-Schulten,et al.  Classical force field parameters for the heme prosthetic group of cytochrome c , 2004, J. Comput. Chem..

[50]  PETER J. ROUSSEEUW,et al.  Computing LTS Regression for Large Data Sets , 2005, Data Mining and Knowledge Discovery.

[51]  E. Mehler,et al.  Electrostatic effects in proteins: comparison of dielectric and charge models. , 1991, Protein engineering.

[52]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..