The Use of Random Forest to Predict Binding Affinity in Docking

Docking is a structure-based computational tool that can be used to predict the strength with which a small ligand molecule binds to a macromolecular target. Such binding affinity prediction is crucial to design molecules that bind more tightly to a target and thus are more likely to provide the most efficacious modulation of the target’s biochemical function. Despite intense research over the years, improving this type of predictive accuracy has proven to be a very challenging task for any class of method.

[1]  Kwong-Sak Leung,et al.  Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets , 2015, Molecular informatics.

[2]  Bo Wang,et al.  Support Vector Regression Scoring of Receptor-Ligand Complexes for Rank-Ordering and Virtual Screening of Chemical Libraries , 2011, J. Chem. Inf. Model..

[3]  Kwong-Sak Leung,et al.  istar: A Web Platform for Large-Scale Protein-Ligand Docking , 2014, PloS one.

[4]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[5]  Peter Gedeck,et al.  Leave-Cluster-Out Cross-Validation Is Appropriate for Scoring Functions Derived from Diverse Protein Data Sets , 2010, J. Chem. Inf. Model..

[6]  Jian Wang,et al.  Characterization of Small Molecule Binding. I. Accurate Identification of Strong Inhibitors in Virtual Screening , 2013, J. Chem. Inf. Model..

[7]  Pedro J. Ballester,et al.  Machine Learning Scoring Functions Based on Random Forest and Support Vector Regression , 2012, PRIB.

[8]  Zhihai Liu,et al.  Comparative Assessment of Scoring Functions on a Diverse Test Set , 2009, J. Chem. Inf. Model..

[9]  Kwong-Sak Leung,et al.  Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: Cyscore as a case study , 2014, BMC Bioinformatics.

[10]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[11]  John B. O. Mitchell,et al.  A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking , 2010, Bioinform..

[12]  John B. O. Mitchell,et al.  Hierarchical virtual screening for the discovery of new molecular scaffolds in antibacterial hit identification , 2012, Journal of The Royal Society Interface.

[13]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[14]  Jie Li,et al.  Comparative Assessment of Scoring Functions on an Updated Benchmark: 1. Compilation of the Test Set , 2014, J. Chem. Inf. Model..

[15]  Kwong-Sak Leung,et al.  iview: an interactive WebGL visualizer for protein-ligand complex , 2014, BMC Bioinformatics.

[16]  Teruki Honma,et al.  Combining Machine Learning and Pharmacophore-Based Interaction Fingerprint for in Silico Screening , 2010, J. Chem. Inf. Model..

[17]  John B. O. Mitchell,et al.  Comments on "Leave-Cluster-Out Cross-Validation Is Appropriate for Scoring Functions Derived from Diverse Protein Data Sets": Significance for the Validation of Scoring Functions , 2011, J. Chem. Inf. Model..

[18]  Tom L. Blundell,et al.  Does a More Precise Chemical Description of Protein–Ligand Complexes Lead to More Accurate Prediction of Binding Affinity? , 2014, J. Chem. Inf. Model..

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Kwong-Sak Leung,et al.  idock: A multithreaded virtual screening tool for flexible ligand docking , 2012, 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).