Slice_OP: Selecting Initial Cluster Centers Using Observation Points

This paper proposes a new algorithm, Slice_OP, which selects the initial cluster centers on high-dimensional data. A set of observation points is allocated to transform the high-dimensional data into one-dimensional distance data. Multiple Gamma models are built on distance data, which are fitted with the expectation-maximization algorithm. The best-fitted model is selected with the second-order Akaike information criterion. We estimate the candidate initial centers from the objects in each component of the best-fitted model. A cluster tree is built based on the distance matrix of candidate initial centers and the cluster tree is divided into K branches. Objects in each branch are analyzed with k-nearest neighbor algorithm to select initial cluster centers. The experimental results show that the Slice_OP algorithm outperformed the state-of-the-art Kmeans++ algorithm and random center initialization in the k-means algorithm on synthetic and real-world datasets.

[1]  Aristidis Likas,et al.  The MinMax k-Means clustering algorithm , 2014, Pattern Recognit..

[2]  Ming Zhong,et al.  I-nice: A new approach for identifying the number of clusters and initial cluster centres , 2018, Inf. Sci..

[3]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[4]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[5]  Maria Dolores Gil Montoya,et al.  A Pareto-based multi-objective evolutionary algorithm for automatic rule generation in network intrusion detection systems , 2013, Soft Comput..

[6]  Murat Erisoglu,et al.  A new algorithm for initial cluster centers in k-means algorithm , 2011, Pattern Recognit. Lett..

[7]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[8]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[9]  Ludmila I. Kuncheva,et al.  Using diversity in cluster ensembles , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[10]  Marcos Martin-Fernandez,et al.  Gamma mixture classifier for plaque detection in intravascular ultrasonic images , 2014, IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control.

[11]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[12]  Shehroz S. Khan,et al.  Cluster center initialization algorithm for K-means clustering , 2004, Pattern Recognit. Lett..

[13]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  M. P. S Bhatia,et al.  Analysis of Initial Centers for k-Means Clustering Algorithm , 2013 .

[15]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[16]  Yunming Ye,et al.  Neighborhood Density Method for Selecting Initial Cluster Centers in K-Means Clustering , 2006, PAKDD.

[17]  Adrian E. Raftery,et al.  Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering , 2007, J. Classif..

[18]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[19]  N. Sugiura Further analysts of the data by akaike' s information criterion and the finite corrections , 1978 .

[20]  S. Deelers,et al.  Enhancing K-Means Algorithm with Initial Cluster Centers Derived from Data Partitioning along the Data Axis with the Highest Variance , 2007 .

[21]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .