A robust iterative refinement clustering algorithm with smoothing search space

Iterative refinement clustering algorithms are widely used in data mining area, but they are sensitive to the initialization. In the past decades, many modified initialization methods have been proposed to reduce the influence of initialization sensitivity problem. The essence of iterative refinement clustering algorithms is the local search method. The big numbers of the local minimum points which are embedded in the search space make the local search problem hard and sensitive to the initialization. The smaller number of local minimum points, the more robust of initialization for a local search algorithm is. In this paper, we propose a Top-Down Clustering algorithm with Smoothing Search Space (TDCS3) to reduce the influence of initialization. The main steps of TDCS3 are to: (1) dynamically reconstruct a series of smoothed search spaces into a hierarchical structure by 'filling' the local minimum points; (2) at the top level of the hierarchical structure, an existing iterative refinement clustering algorithm is run with random initialization to generate the clustering result; (3) eventually from the second level to the bottom level of the hierarchical structure, the same clustering algorithm is run with the initialization derived from the previous clustering result. Experiment results on 3 synthetic and 10 real world data sets have shown that TDCS3 has significant effects on finding better, robust clustering result and reducing the impact of initialization.

[1]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[2]  Stephen J. Redmond,et al.  A method for initialising the K-means clustering algorithm using kd-trees , 2007, Pattern Recognit. Lett..

[3]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[4]  T. Mexia,et al.  Author ' s personal copy , 2009 .

[5]  Mothd Belal Al-Daoud A New Algorithm for Cluster Initialization , 2005, WEC.

[6]  Jun Gu,et al.  Efficient Local Search With Search Space Smoothing: A Case Study of the Traveling Salesman Problem (TSP) , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[7]  Ah-Hwee Tan,et al.  On Quantitative Evaluation of Clustering Systems , 2003, Clustering and Information Retrieval.

[8]  Michael Randolph Garey,et al.  The complexity of the generalized Lloyd - Max problem , 1982, IEEE Trans. Inf. Theory.

[9]  Wei Tang,et al.  Clusterer ensemble , 2006, Knowl. Based Syst..

[10]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[12]  Joydeep Ghosh,et al.  Cluster ensembles , 2011, Data Clustering: Algorithms and Applications.

[13]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[14]  Junliang Chen,et al.  An Initializing Cluster Centers Algorithm Based on Pointer Ring , 2006, Sixth International Conference on Intelligent Systems Design and Applications.

[15]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[16]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[17]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[18]  Bernardetta Addis,et al.  Local optima smoothing for global optimization , 2005, Optim. Methods Softw..

[19]  Kuo-Lung Wu,et al.  Mean shift-based clustering , 2007, Pattern Recognit..

[20]  Jing-Yu Yang,et al.  Hierarchical initialization approach for K-Means clustering , 2008, Pattern Recognit. Lett..

[21]  C.-C. Jay Kuo,et al.  A new initialization technique for generalized Lloyd iteration , 1994, IEEE Signal Processing Letters.

[22]  Jiang Qingshan,et al.  A Hierarchical Method for Determining the Number of Clusters , 2007 .

[23]  Man Lan,et al.  Initialization of cluster refinement algorithms: a review and comparative study , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[24]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[25]  Li-Fei Chen A Hierarchical Method for Determining the Number of Clusters: A Hierarchical Method for Determining the Number of Clusters , 2008 .

[26]  Chen Li A Hierarchical Method for Determining the Number of Clusters , 2008 .

[27]  Stuart A. Roberts,et al.  New methods for the initialisation of clusters , 1996, Pattern Recognit. Lett..

[28]  Teofilo F. Gonzalez,et al.  P-Complete Approximation Problems , 1976, J. ACM.