The Importance of the Regression Model in the Structure-Based Prediction of Protein-Ligand Binding

Docking is a key computational method for structure-based design of starting points in the drug discovery process. Recently, the use of non-parametric machine learning to circumvent modelling assumptions has been shown to result in a large improvement in the accuracy of docking. As a result, these machine-learning scoring functions are able to widely outperform classical scoring functions. The latter are characterized by their reliance on a predetermined theory-inspired functional form for the relationship between the variables that characterise the complex and its predicted binding affinity.

[1]  Jung-Hsin Lin,et al.  Scoring functions for prediction of protein-ligand interactions. , 2013, Current pharmaceutical design.

[2]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[3]  Kwong-Sak Leung,et al.  istar: A Web Platform for Large-Scale Protein-Ligand Docking , 2014, PloS one.

[4]  John B. O. Mitchell,et al.  Hierarchical virtual screening for the discovery of new molecular scaffolds in antibacterial hit identification , 2012, Journal of The Royal Society Interface.

[5]  John B. O. Mitchell,et al.  Comments on "Leave-Cluster-Out Cross-Validation Is Appropriate for Scoring Functions Derived from Diverse Protein Data Sets": Significance for the Validation of Scoring Functions , 2011, J. Chem. Inf. Model..

[6]  Zhihai Liu,et al.  Comparative Assessment of Scoring Functions on a Diverse Test Set , 2009, J. Chem. Inf. Model..

[7]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[8]  Kwong-Sak Leung,et al.  iview: an interactive WebGL visualizer for protein-ligand complex , 2014, BMC Bioinformatics.

[9]  Tom L. Blundell,et al.  Does a More Precise Chemical Description of Protein–Ligand Complexes Lead to More Accurate Prediction of Binding Affinity? , 2014, J. Chem. Inf. Model..

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Pedro J. Ballester,et al.  Machine Learning Scoring Functions Based on Random Forest and Support Vector Regression , 2012, PRIB.

[12]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998 .

[13]  David S. Goodsell,et al.  AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility , 2009, J. Comput. Chem..

[14]  John B. O. Mitchell,et al.  A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking , 2010, Bioinform..