Comparison of Molecular Fingerprint Methods on the Basis of Biological Profile Data

In this study we evaluated a set of molecular fingerprint methods with respect to their capability to reproduce similarities in the biological activity space. The evaluation presented in this paper is therefore different from many other fingerprint studies, in which the enrichment of active compounds binding to the same target as selected query structures was studied. Conversely, our data set was extracted from the BioPrint database, which contains uniformly derived biological activity profiles of mainly marketed drugs for a range of biological assays relevant for the pharmaceutical industry. We compared calculated molecular fingerprint similarity values between all compound pairs of the data set with the corresponding similarities in the biological activity space and additionally analyzed agreements of generated clusterings. A closer analysis of the compound pairs with a high biological activity similarity revealed that fingerprint methods such as CHEMGPS or TRUST4, which describe global features of a molecule such as physicochemical properties and pharmacophore patterns, might be better suited to describe similarity of biological activity profiles than purely structural fingerprint methods. It is therefore suggested that the usage of these fingerprint methods could increase the probability of finding molecules with a similar biological activity profile but yet a different chemical structure.