"In-House Likeness": Comparison of Large Compound Collections Using Artificial Neural Networks

Binary classification models able to discriminate between data sets of compounds are useful tools in a range of applications from compound acquisition to library design. In this paper we investigate the ability of artificial neural networks to discriminate between compound collections from various sources aiming at developing an "in-house likeness" scoring scheme (i.e. in-house vs external compounds) for compound acquisition. Our analysis shows atom-type based Ghose-Crippen fingerprints in combination with artificial neural networks to be an efficient way to construct such filters. A simple measure of the chemical overlap between different compound collections can be derived using the output scores from the neural net models.