Application of predictive QSAR models to database mining: identification and experimental validation of novel anticonvulsant compounds.

We have developed a drug discovery strategy that employs variable selection quantitative structure-activity relationship (QSAR) models for chemical database mining. The approach starts with the development of rigorously validated QSAR models obtained with the variable selection k nearest neighbor (kNN) method (or, in principle, with any other robust model-building technique). Model validation is based on several statistical criteria, including the randomization of the target property (Y-randomization), independent assessment of the training set model's predictive power using external test sets, and the establishment of the model's applicability domain. All successful models are employed in database mining concurrently; in each case, only variables selected as a result of model building (termed descriptor pharmacophore) are used in chemical similarity searches comparing active compounds of the training set (queries) with those in chemical databases. Specific biological activity (characteristic of the training set compounds) of external database entries found to be within a predefined similarity threshold of the training set molecules is predicted on the basis of the validated QSAR models using the applicability domain criteria. Compounds judged to have high predicted activities by all or the majority of all models are considered as consensus hits. We report on the application of this computational strategy for the first time for the discovery of anticonvulsant agents in the Maybridge and National Cancer Institute (NCI) databases containing ca. 250,000 compounds combined. Forty-eight anticonvulsant agents of the functionalized amino acid (FAA) series were used to build kNN variable selection QSAR models. The 10 best models were applied to mining chemical databases, and 22 compounds were selected as consensus hits. Nine compounds were synthesized and tested at the NIH Epilepsy Branch, Rockville, MD using the same biological test that was employed to assess the anticonvulsant activity of the training set compounds; of these nine, four were exact database hits and five were derived from the hits by minor chemical modifications. Seven of these nine compounds were confirmed to be active, indicating an exceptionally high hit rate. The approach described in this report can be used as a general rational drug discovery tool.