论文信息 - Slice_OP: Selecting Initial Cluster Centers Using Observation Points

Slice_OP: Selecting Initial Cluster Centers Using Observation Points

This paper proposes a new algorithm, Slice_OP, which selects the initial cluster centers on high-dimensional data. A set of observation points is allocated to transform the high-dimensional data into one-dimensional distance data. Multiple Gamma models are built on distance data, which are fitted with the expectation-maximization algorithm. The best-fitted model is selected with the second-order Akaike information criterion. We estimate the candidate initial centers from the objects in each component of the best-fitted model. A cluster tree is built based on the distance matrix of candidate initial centers and the cluster tree is divided into K branches. Objects in each branch are analyzed with k-nearest neighbor algorithm to select initial cluster centers. The experimental results show that the Slice_OP algorithm outperformed the state-of-the-art Kmeans++ algorithm and random center initialization in the k-means algorithm on synthetic and real-world datasets.

[1] Aristidis Likas,et al. The MinMax k-Means clustering algorithm , 2014, Pattern Recognit..

[2] Ming Zhong,et al. I-nice: A new approach for identifying the number of clusters and initial cluster centres , 2018, Inf. Sci..

[3] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[4] Jesús Alcalá-Fdez,et al. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[5] Maria Dolores Gil Montoya,et al. A Pareto-based multi-objective evolutionary algorithm for automatic rule generation in network intrusion detection systems , 2013, Soft Comput..

[6] Murat Erisoglu,et al. A new algorithm for initial cluster centers in k-means algorithm , 2011, Pattern Recognit. Lett..

[7] A. Raftery,et al. Model-based Gaussian and non-Gaussian clustering , 1993 .

[8] Adrian E. Raftery,et al. Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[9] Ludmila I. Kuncheva,et al. Using diversity in cluster ensembles , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[10] Marcos Martin-Fernandez,et al. Gamma mixture classifier for plaque detection in intravascular ultrasonic images , 2014, IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control.

[11] Paul S. Bradley,et al. Refining Initial Points for K-Means Clustering , 1998, ICML.

[12] Shehroz S. Khan,et al. Cluster center initialization algorithm for K-means clustering , 2004, Pattern Recognit. Lett..

[13] Anil K. Jain,et al. Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14] M. P. S Bhatia,et al. Analysis of Initial Centers for k-Means Clustering Algorithm , 2013 .

[15] John E. Dennis,et al. Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[16] Yunming Ye,et al. Neighborhood Density Method for Selecting Initial Cluster Centers in K-Means Clustering , 2006, PAKDD.

[17] Adrian E. Raftery,et al. Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering , 2007, J. Classif..

[18] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[19] N. Sugiura. Further analysts of the data by akaike' s information criterion and the finite corrections , 1978 .

[20] S. Deelers,et al. Enhancing K-Means Algorithm with Initial Cluster Centers Derived from Data Partitioning along the Data Axis with the Highest Variance , 2007 .

[21] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .