On the significance of clusters in the graphical display of structure-activity data.

A method is presented to evaluate the statistical significance of an apparently clustered group in the graphical display of structure-activity data. Two variations are described; each is implemented by means of a computer program. The first is applicable in situations with relatively small sets of compounds where a complete enumeration of all possible clusters can be accomplished reasonably on a high-speed electronic computer. The second is applicable in cases where such a calculation would be too time consuming. This latter variation uses random sampling of the set of all possible clusters. An application for each variation is given: for the smaller case a reevaluation of a study on aminotetralin and aminoindan monoamine oxidase inhibitors; for the larger case the discovery of some physical parameters that influence mutagenicity among some aminoacridine derivatives. It is proposed that this new technique be called cluster significance analysis (CSA).