Cluster validation using a probabilistic attributed graph

We propose a new cluster validity index. A data partition is described by a set of disjoint sub-graphs, each corresponding to the minimum spanning tree of a cluster, taking as edge weight the dissimilarity between linked objects. Based on the assumption that each cluster has a characteristic parametric distribution of dissimilarity increments, graph probabilities are estimated. The validity index is defined as the minimum description length for both estimated model parameters and data partition, according to this probabilistic model. This new index can be used to evaluate various partitions of a given data set obtained by: (i) a single clustering algorithm, (ii) different clustering algorithms, or (iii) cluster ensemble methods. Experimental evaluation of the proposed index on synthetic and real data taken from the UCI repository confirms the usefulness of the method in selecting good clustering solutions.

[1]  Mohamed S. Kamel,et al.  Cluster-Based Cumulative Ensembles , 2005, Multiple Classifier Systems.

[2]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[5]  Ana L. N. Fred,et al.  A New Cluster Isolation Criterion Based on Dissimilarity Increments , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.