Assessing Clustering Reliability and Features Informativeness by Random Permutations

Assessing the quality of a clustering's outcome is a challenging task that can be cast in a number of different frameworks, depending on the specific subtask, like estimating the right clusters' number or quantifying how much the data support the partition given by the algorithm. In this paper we propose a computational intensive procedure to evaluate: (i) the consistence of a clustering solution, (ii) the informativeness of each feature and (iii) the most suitable value for a parameter. The proposed approach does not depend on the specific clustering algorithm chosen, it is based on random permutations and produces an ensemble of empirical probability distributions of an index of quality. Looking to this ensemble it is possible to extract hints on how single features affect the clustering outcome, how consistent is the clustering result and what's the most suitable value for a parameter (e.g. the correct number of clusters). Results on simulated and real data highlight a surprisingly effective discriminative power.