SFCscoreRF: A Random Forest-Based Scoring Function for Improved Affinity Prediction of Protein-Ligand Complexes

A major shortcoming of empirical scoring functions for protein-ligand complexes is the low degree of correlation between predicted and experimental binding affinities, as frequently observed not only for large and diverse data sets but also for SAR series of individual targets. Improvements can be envisaged by developing new descriptors, employing larger training sets of higher quality, and resorting to more sophisticated regression methods. Herein, we describe the use of SFCscore descriptors to develop an improved scoring function by means of a PDBbind training set of 1005 complexes in combination with random forest for regression. This provided SFCscore(RF) as a new scoring function with significantly improved performance on the PDBbind and CSAR-NRC HiQ benchmarks in comparison to previously developed SFCscore functions. A leave-cluster-out cross-validation and performance in the CSAR 2012 scoring exercise point out remaining limitations but also directions for further improvements of SFCscore(RF) and empirical scoring functions in general.

[1]  Pei Zhou,et al.  Mechanism and inhibition of LpxC: an essential zinc-dependent deacetylase of bacterial lipid A synthesis. , 2008, Current pharmaceutical biotechnology.

[2]  C. E. Peishoff,et al.  A critical assessment of docking programs and scoring functions. , 2006, Journal of medicinal chemistry.

[3]  Xiaoling Xie,et al.  Structure-guided design of potent and selective pyrimidylpyrrole inhibitors of extracellular signal-regulated kinase (ERK) using conformational control. , 2009, Journal of medicinal chemistry.

[4]  J. Sebolt-Leopold,et al.  Targeting the mitogen-activated protein kinase cascade to treat cancer , 2004, Nature Reviews Cancer.

[5]  John B. O. Mitchell,et al.  A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking , 2010, Bioinform..

[6]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[7]  Christoph A. Sotriffer,et al.  Scoring Functions for Protein–Ligand Interactions , 2012 .

[8]  Hege S. Beard,et al.  Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. , 2004, Journal of medicinal chemistry.

[9]  Richard D. Smith,et al.  CSAR Benchmark Exercise of 2010: Selection of the Protein–Ligand Complexes , 2011, J. Chem. Inf. Model..

[10]  Christoph A. Sotriffer,et al.  The Challenge of Affinity Prediction: Scoring Functions for Structure‐Based Virtual Screening , 2011 .

[11]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[12]  Jeremy R. Greenwood,et al.  Epik: a software program for pKa prediction and protonation state generation for drug-like molecules , 2007, J. Comput. Aided Mol. Des..

[13]  Michael K. Gilson,et al.  Virtual Screening of Molecular Databases Using a Support Vector Machine , 2005, J. Chem. Inf. Model..

[14]  David Calkins,et al.  Towards the comprehensive, rapid, and accurate prediction of the favorable tautomeric states of drug-like molecules in aqueous solution , 2010, J. Comput. Aided Mol. Des..

[15]  Peter Gedeck,et al.  Leave-Cluster-Out Cross-Validation Is Appropriate for Scoring Functions Derived from Diverse Protein Data Sets , 2010, J. Chem. Inf. Model..

[16]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[17]  Richard D. Smith,et al.  CSAR Benchmark Exercise of 2010: Combined Evaluation Across All Submitted Scoring Functions , 2011, J. Chem. Inf. Model..

[18]  Gerhard Klebe,et al.  DSX: A Knowledge-Based Scoring Function for the Assessment of Protein-Ligand Complexes , 2011, J. Chem. Inf. Model..

[19]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[20]  Christopher R. Corbeil,et al.  Towards the development of universal, fast and highly accurate docking/scoring methods: a long way to go , 2008, British journal of pharmacology.

[21]  Gerhard Klebe,et al.  SFCscore: Scoring functions for affinity prediction of protein–ligand complexes , 2008, Proteins.

[22]  Ata Amini,et al.  A general approach for developing system‐specific functions to score protein–ligand docked complexes using support vector inductive logic programming , 2007, Proteins.

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[24]  Zhihai Liu,et al.  Comparative Assessment of Scoring Functions on a Diverse Test Set , 2009, J. Chem. Inf. Model..

[25]  John B. O. Mitchell,et al.  Comments on "Leave-Cluster-Out Cross-Validation Is Appropriate for Scoring Functions Derived from Diverse Protein Data Sets": Significance for the Validation of Scoring Functions , 2011, J. Chem. Inf. Model..

[26]  Bo Wang,et al.  Support Vector Regression Scoring of Receptor-Ligand Complexes for Rank-Ordering and Virtual Screening of Chemical Libraries , 2011, J. Chem. Inf. Model..

[27]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[28]  Renxiao Wang,et al.  The PDBbind database: methodologies and updates. , 2005, Journal of medicinal chemistry.

[29]  Chang Park,et al.  Discovery of 1,4-dihydroindeno[1,2-c]pyrazoles as a novel class of potent and selective checkpoint kinase 1 inhibitors. , 2007, Bioorganic & medicinal chemistry.

[30]  Pei Zhou,et al.  Species-specific and inhibitor-dependent conformations of LpxC: implications for antibiotic design. , 2011, Chemistry & biology.

[31]  Renxiao Wang,et al.  The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. , 2004, Journal of medicinal chemistry.

[32]  Yongbo Hu,et al.  Comparison of Several Molecular Docking Programs: Pose Prediction and Virtual Screening Accuracy , 2009, J. Chem. Inf. Model..

[33]  Matthew P. Repasky,et al.  Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. , 2006, Journal of medicinal chemistry.