论文信息 - On Strategies to Fix Degenerate k-means Solutions - 字舞流文

On Strategies to Fix Degenerate k-means Solutions

Abstractk-means is a benchmark algorithm used in cluster analysis. It belongs to the large category of heuristics based on location-allocation steps that alternately locate cluster centers and allocate data points to them until no further improvement is possible. Such heuristics are known to suffer from a phenomenon called degeneracy in which some of the clusters are empty. In this paper, we compare and propose a series of strategies to circumvent degenerate solutions during a k-means execution. Our computational experiments show that these strategies are effective, leading to better clustering solutions in the vast majority of the cases in which degeneracy appears in k-means. Moreover, we compare the use of our fixing strategies within k-means against the use of two initialization methods found in the literature. These results demonstrate how useful the proposed strategies can be, specially inside memorybased clustering algorithms.

Nenad Mladenovic | Daniel Aloise | Daniel Nobre Pinheiro | Nielsen Castelo Damasceno | N. Mladenović | Daniel Aloise | Nielsen Castelo Damasceno

[1] E. Forgy,et al. Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[2] Iven Van Mechelen,et al. On the Added Value of Bootstrap Analysis for K-Means Clustering , 2015, Journal of Classification.

[3] Meena Mahajan,et al. The Planar k-means Problem is NP-hard I , 2009 .

[4] Paul E. Green,et al. A Computational Study of Replicated Clustering with an Application to Market Segmentation , 1991 .

[5] Leon Cooper,et al. Heuristic Methods for Location-Allocation Problems , 1964 .

[6] Pierre Hansen,et al. An improved column generation algorithm for minimum sum-of-squares clustering , 2009, Math. Program..

[7] W. DeSarbo,et al. The Heterogeneous P-Median Problem for Categorization Based Clustering , 2012, Psychometrika.

[8] Michael J. Brusco,et al. Initializing K-means Batch Clustering: A Critical Evaluation of Several Techniques , 2007, J. Classif..

[9] Xindong Wu,et al. The Top Ten Algorithms in Data Mining , 2009 .

[10] Adil M. Bagirov,et al. A heuristic algorithm for solving the minimum sum-of-squares clustering problems , 2015, J. Glob. Optim..

[11] Anna Choromanska,et al. Online Clustering with Experts , 2012, AISTATS.

[12] Rebecca Nugent,et al. Skill Set Profile Clustering: The Empty K-Means Algorithm with Automatic Specification of Starting Cluster Centers , 2010, EDM.

[13] Le Thi Hoai An,et al. New and efficient DCA based algorithms for minimum sum-of-squares clustering , 2014, Pattern Recognit..

[14] J. Wolpaw,et al. Clinical Applications of Brain-Computer Interfaces: Current State and Future Prospects , 2009, IEEE Reviews in Biomedical Engineering.

[15] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[16] M. Brusco,et al. A Comparison of Heuristic Procedures for Minimum Within-Cluster Sums of Squares Partitioning , 2007 .

[17] C. A. Haverly. Studies of the behavior of recursion for the pooling problem , 1978, SMAP.

[18] Paul S. Bradley,et al. Refining Initial Points for K-Means Clustering , 1998, ICML.

[19] Joaquín A. Pacheco,et al. Design of hybrids for the minimum sum-of-squares clustering problem , 2003, Comput. Stat. Data Anal..

[20] Pierre Hansen,et al. New heuristic for harmonic means clustering , 2015, J. Glob. Optim..

[21] Marc Teboulle,et al. A Unified Continuous Optimization Framework for Center-Based Clustering Methods , 2007, J. Mach. Learn. Res..

[22] Douglas Steinley,et al. K-means clustering: a half-century synthesis. , 2006, The British journal of mathematical and statistical psychology.

[23] Pierre Hansen,et al. J-MEANS: a new local search heuristic for minimum sum of squares clustering , 1999, Pattern Recognit..

[24] Nenad Mladenovic,et al. Degeneracy in the multi-source Weber problem , 1999, Math. Program..

[25] Pierre Hansen,et al. Analysis of Global k-Means, an Incremental Heuristic for Minimum Sum-of-Squares Clustering , 2005, J. Classif..

[26] Nicos Christofides,et al. Distribution management : mathematical modelling and practical analysis , 1971 .

[27] Ray Jain,et al. The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[28] Yue Zhao,et al. Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup , 2015, ICML.

[29] Mary Inaba,et al. Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract) , 1994, SCG '94.

[30] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[31] J. H. Ward. Hierarchical Grouping to Optimize an Objective Function , 1963 .

[32] Jean Ponce,et al. Sparse Modeling for Image and Vision Processing , 2014, Found. Trends Comput. Graph. Vis..

[33] Enrique H. Ruspini,et al. Numerical methods for fuzzy clustering , 1970, Inf. Sci..

[34] Pierre Hansen,et al. NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.