Positive Predictive Value Surfaces as a Complementary Tool to Assess the Performance of Virtual Screening Methods

Background Since their introduction in the virtual screening field, Receiver Operating Characteristic (ROC) curve-derived metrics have been widely used for benchmarking of computational methods and algorithms intended for virtual screening applications. Whereas in classification problems, the ratio between sensitivity and specificity for a given score value is very informative, a practical concern in virtual screening campaigns is to predict the actual probability that a predicted hit will prove truly active when submitted to experimental testing (in other words, the Positive Predictive Value - PPV). Estimation of such probability is however, obstructed due to its dependency on the yield of actives of the screened library, which cannot be known a priori. Objective To explore the use of PPV surfaces derived from simulated ranking experiments (retrospective virtual screening) as a complementary tool to ROC curves, for both benchmarking and optimization of score cutoff values. Methods The utility of the proposed approach is assessed in retrospective virtual screening experiments with four datasets used to infer QSAR classifiers: inhibitors of Trypanosoma cruzi trypanothione synthetase; inhibitors of Trypanosoma brucei N-myristoyltransferase; inhibitors of GABA transaminase and anticonvulsant activity in the 6 Hz seizure model. Results Besides illustrating the utility of PPV surfaces to compare the performance of machine learning models for virtual screening applications and to select an adequate score threshold, our results also suggest that ensemble learning provides models with better predictivity and more robust behavior. Conclusion PPV surfaces are valuable tools to assess virtual screening tools and choose score thresholds to be applied in prospective in silico screens. Ensemble learning approaches seem to consistently lead to improved predictivity and robustness.