A novel iterative partitioning approach for building prime clusters

Cluster analysis is an iterative process of knowledge discovery by segregating the data objects into significant and logical groups. An efficient clustering algorithm will produce groups of similar objects which are tightly bonded within the groups and independent between groups. In this paper, we propose a new iterative, non-parametric, partitioning clustering algorithm called prime equivalence clustering algorithm PECA based on computation of distance to subsets of attributes where object values are closer. The right number of clusters and the final partition of the datasets are automatically determined without any prior knowledge. The performance of this algorithm has been studied on benchmark datasets and it is proven better than the well-known clustering algorithms.

[1]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[2]  S. Dolnicar,et al.  An examination of indexes for determining the number of clusters in binary data sets , 2002, Psychometrika.

[3]  Don-Lin Yang,et al.  An Efficient k-Means Clustering Algorithm Using Simple Partitioning , 2005, J. Inf. Sci. Eng..

[4]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[5]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[6]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[7]  Lokesh Kumar Sharma,et al.  Genetic K-Means Clustering Algorithm for Mixed Numeric and Categorical Data Sets , 2010 .

[8]  Pasi Fränti,et al.  On the Efficiency of Swap-Based Clustering , 2009, ICANNGA.

[9]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[10]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  W. W. Daniel Applied Nonparametric Statistics , 1979 .

[13]  Hui Xiong,et al.  Understanding of Internal Clustering Validation Measures , 2010, 2010 IEEE International Conference on Data Mining.

[14]  Irwin King,et al.  Non-hierarchical Clustering with Rival Penalized Competitive Learning for Information Retrieval , 1999, MLDM.

[15]  P. Fränti,et al.  Sum-of-Squares Based Cluster Validity Index and Significance Analysis , 2009, ICANNGA.

[16]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[17]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[18]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[19]  Taher Niknam,et al.  A New Evolutionary Algorithm for Cluster Analysis , 2008 .

[20]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[21]  Hassan Abolhassani,et al.  Harmony K-means algorithm for document clustering , 2009, Data Mining and Knowledge Discovery.

[22]  Minho Kim,et al.  New indices for cluster validity assessment , 2005, Pattern Recognit. Lett..

[23]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[24]  Pasi Fränti,et al.  Probabilistic clustering by random swap algorithm , 2008, 2008 19th International Conference on Pattern Recognition.

[25]  Pasi Fränti,et al.  Randomised Local Search Algorithm for the Clustering Problem , 2000, Pattern Analysis & Applications.

[26]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[27]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.