Loevinger's measures of rule quality for assessing cluster stability

A method is developed for measuring clustering stability under the removal of a few objects from a set of objects to be partitioned. Measures of stability of an individual cluster are defined as Loevinger's measures of rule quality. The stability of an individual cluster can be interpreted as a weighted mean of the inherent stabilities in the isolation and cohesion, respectively, of the examined cluster. The design of the method also enables us to measure the stability of a partition, that can be viewed as a weighted mean of the stability measures of all clusters in the partition. As a consequence, an approach is derived for determining the optimal number of clusters of a partition. Furthermore, using a Monte Carlo test, a significance probability is computed in order to assess how likely any stability measure is, under a null model that specifies the absence of cluster stability. In order to illustrate the potential of the method, stability measures that were obtained by using the batch K-Means algorithm on artificial data sets and on Iris Data are presented.

[1]  Wolfgang Gaul,et al.  From Data to Knowledge: Theoretical and Practical Aspects of Classification, Data Analysis, and Knowledge Organization , 1996 .

[2]  A. D. Gordon Identifying genuine clusters in a classification , 1994 .

[3]  Stephen J. Roberts,et al.  Parametric and non-parametric unsupervised cluster analysis , 1997, Pattern Recognit..

[4]  Eytan Domany,et al.  Resampling Method for Unsupervised Estimation of Cluster Validity , 2001, Neural Computation.

[5]  Anil K. Jain,et al.  Validity studies in clustering methodologies , 1979, Pattern Recognit..

[6]  I. C. Lerman,et al.  Les bases de la classification automatique , 1971 .

[7]  Robert Tibshirani,et al.  Cluster Validation by Prediction Strength , 2005 .

[8]  A. D. Gordon Null Models in Cluster Validation , 1996 .

[9]  Richard C. Dubes,et al.  A test for spatial homogeneity in cluster analysis , 1987 .

[10]  Isabelle Guyon,et al.  A Stability Based Method for Discovering Structure in Clustered Data , 2001, Pacific Symposium on Biocomputing.

[11]  Richard C. Dubes,et al.  Cluster validity profiles , 1982, Pattern Recognit..

[12]  G. W. Milligan,et al.  CLUSTERING VALIDATION: RESULTS AND IMPLICATIONS FOR APPLIED ANALYSES , 1996 .

[13]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[14]  Guangzhou Zeng,et al.  A test for spatial randomness based on k-NN distances , 1985, Pattern Recognit. Lett..

[15]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[16]  Morris H. Hansen,et al.  Sample survey methods and theory , 1955 .

[17]  Anil K. Jain,et al.  Testing for Uniformity in Multidimensional Data , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  J. Loevinger A systematic approach to the construction and evaluation of tests of ability. , 1947 .

[19]  F. Baker Stability of Two Hierarchical Grouping Techniques Case I: Sensitivity to Data Errors , 1974 .

[20]  G. De Soete,et al.  Clustering and Classification , 2019, Data-Driven Science and Engineering.

[21]  F. James Rohlf,et al.  A RANDOMIZATION TEST OF THE NON SPECIFICITY HYPOTHESIS IN NUMERICAL TAXONOMY , 1965 .

[22]  G. W. Milligan,et al.  Measuring the influence of individual data points in a cluster analysis , 1996 .

[23]  Alex Alves Freitas,et al.  On rule interestingness measures , 1999, Knowl. Based Syst..

[24]  H. Bock On some significance tests in cluster analysis , 1985 .

[25]  T. Tony Cai,et al.  Confidence Intervals for a binomial proportion and asymptotic expansions , 2002 .