Cluster Significance Analysis: A New Qsar Tool for Asymmetric Data Sets

Cluster Significance Analysis (CSA) and Soft Independent Modelling of Class Analogy (SIMCA) can complement each other in the analysis of asymmetric (embedded) data. CSA evaluates the significance of a set of descriptors as determinants of activity in a drug series. SIMCA gives a boundary to predict the activity class of new members, and a “modelling power” of individual descriptors. A sequential CSA approach is proposed, and is applied to a set of sulfamate sweetening agents that Miyashita et al analyzed by SIMCA. Knowledge of the modelling power (3) saved substantial labor in conducting the sequential CSA. On the other hand, CSA suggested that the SIMCA analysis may have led to the inclusion of one descriptor more than necessary. Overall, however, there was good agreement between the conclusions drawn from the two methods.