Development of a virtual screening method for identification of "frequent hitters" in compound libraries.

A computer-based method was developed for rapid and automatic identification of potential "frequent hitters". These compounds show up as hits in many different biological assays covering a wide range of targets. A scoring scheme was elaborated from substructure analysis, multivariate linear and nonlinear statistical methods applied to several sets of one and two-dimensional molecular descriptors. The final model is based on a three-layered neural network, yielding a predictive Matthews correlation coefficient of 0.81. This system was able to correctly classify 90% of the test set molecules in a 10-times cross-validation study. The method was applied to database filtering, yielding between 8% (compilation of trade drugs) and 35% (Available Chemicals Directory) potential frequent hitters. This filter will be a valuable tool for the prioritization of compounds from large databases, for compound purchase and biological testing, and for building new virtual libraries.