Suitability of molecular descriptors for database mining. A comparative analysis.

Database mining methods rely on the molecular descriptors used to characterize a structural database. In the present investigation, five different types of descriptors (log P, UNITY fingerprints, ISIS keys, VolSurf, and GRIND) are applied to characterize various databases (n = 1007, 100, and 229) comprising drugs almost exclusively. The validity of the descriptors is comparatively analyzed via principal component analysis and its hierarchical variant, consensus principal component analysis. Both pharmacodynamic and pharmacokinetic aspects of database mining are treated. For pharmacodynamic aspects, clustering behavior achieved with the different descriptors is tested on the chemically homogeneous beta-blockers, benzodiazepines, and penicillins and on the chemically more diverse class I antiarrhythmics. The following ranking is observed: UNITY fingerprints > ISIS keys and GRIND > VolSurf > log P. Regarding information content, the CPCA superweight plot indicates similarity between fingerprints and ISIS keys as well as between VolSurf and log P, while GRIND differs from all the remaining descriptors. Solubility data and blood/brain barrier penetrating behavior serve as test cases for pharmacokinetic aspects. Comparison of the descriptors applied to these data reveals that VolSurf has the most realistic and consistent behavior, GRIND shows intermediate behavior, while UNITY fingerprints and ISIS keys are not well suited for pharmacokinetic profiling. From this comparative analysis, we conclude that VolSurf descriptors exhibit particular advantages in treating pharmacokinetic aspects; UNITY fingerprints, ISIS keys, and GRIND descriptors are of special value for tackling pharmacodynamic aspects of database mining. The parameter log P is of limited applicability in database mining because of rather poor reliability and lack of completeness of data.