Improving Docking-Based Virtual Screening Ability by Integrating Multiple Energy Auxiliary Terms from Molecular Docking Scoring

Virtual Screening (VS) based on molecular docking is an efficient method used for retrieving novel hit compounds in drug discovery. However, the accuracy of current docking scoring function (SF) is usually insufficient. In this study, in order to improve the screening power of SF, a novel approach named EAT-Score was proposed by directly utilizing the energy auxiliary terms (EAT) provided by molecular docking scoring through eXtreme Gradient Boosting (XGBoost). Here, the EAT was defined as the output of the Molecular Operating Environment (MOE) scoring, including the energy scores of five different classical SFs and the Protein-Ligand Interaction Fingerprint (PLIF) terms. The performance of EAT-Score to discriminate actives from decoys was strictly validated on the DUD-E diverse subset by using different performance metrics. The results showed that EAT-Score performed much better than classical SFs in VS, with its AUC values exhibiting an improvement of around 0.3. Meanwhile, EAT-Score could achieve comparable even better prediction performance compared with other state-of-the-art VS methods, such as some machine learning (ML)-based SFs and classical SFs implemented in docking programs, whichever in terms of AUC, LogAUC or BEDROC. Furthermore, the EAT-Score model can capture important binding pattern information from protein-ligand complexes by Shapley additive explanations (SHAP) analysis, which may be very helpful in interpreting the ligand binding mechanism for a certain target and thereby guiding drug design.