Qualitative properties of the minimum sum-of-squares clustering problem

Fundamental qualitative properties of the minimum sum-of-squares clustering problem are established in this paper. We prove that the problem always has a global solution and, under a mild condition, the global solution set is finite. Moreover, the components of each global solution can be computed by an explicit formula. Based on a new concept of non-trivial local solution, we get necessary conditions for a system of centroids to be such a local solution. Interestingly, these necessary conditions are also sufficient ones. Finally, it is proved that the optimal value function is locally Lipschitz, the global solution map is locally upper Lipschitz, and the local solution map has the Aubin property, provided that the original data points are distinct. The obtained complete characterizations of the non-trivial local solutions allow one to understand better the performance of not only the k-means algorithm, but also of other solution methods for the problem in question.

[1]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[2]  V. F. Demʹi︠a︡nov,et al.  Constructive nonsmooth analysis , 1995 .

[3]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[4]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[5]  Meena Mahajan,et al.  The Planar k-means Problem is NP-hard I , 2009 .

[6]  F. Clarke Optimization And Nonsmooth Analysis , 1983 .

[7]  Adil M. Bagirov,et al.  A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems , 2006, Eur. J. Oper. Res..

[8]  Le Thi Hoai An,et al.  Minimum Sum-of-Squares Clustering by DC Programming and DCA , 2009, ICIC.

[9]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[10]  Daniel Aloise,et al.  Less is more: basic variable neighborhood search heuristic for balanced minimum sum-of-squares clustering , 2017, Inf. Sci..

[11]  Nguyen Dong Yen,et al.  On the Problem of Minimizing a Difference of Polyhedral Convex Functions Under Linear Constraints , 2016, J. Optim. Theory Appl..

[12]  M. Brusco A Repetitive Branch-and-Bound Procedure for Minimum Within-Cluster Sums of Squares Partitioning , 2006, Psychometrika.

[13]  Adil M. Bagirov,et al.  Modified global k-means algorithm for minimum sum-of-squares clustering problems , 2008, Pattern Recognit..

[14]  Hanif D. Sherali,et al.  A Global Optimization RLT-based Approach for Solving the Hard Clustering Problem , 2005, J. Glob. Optim..

[15]  Hanif D. Sherali,et al.  Nonlinear Programming - Theory and Algorithms, Third Edition , 2005 .

[16]  Adil M. Bagirov,et al.  A heuristic algorithm for solving the minimum sum-of-squares clustering problems , 2015, J. Glob. Optim..

[17]  Joydeep Ghosh,et al.  Data Clustering Algorithms And Applications , 2013 .

[18]  A. Rama Mohan Reddy,et al.  An efficient k-means clustering filtering algorithm using density based initial cluster centers , 2017, Inf. Sci..

[19]  A. Bagirov AN INCREMENTAL DC ALGORITHM FOR THE MINIMUM SUM-OF-SQUARES CLUSTERING , 2014 .

[20]  A. Rubinov,et al.  Unsupervised and supervised data classification via nonsmooth and global optimization , 2003 .

[21]  Adil M. Bagirov,et al.  Fast modified global k-means algorithm for incremental cluster construction , 2011, Pattern Recognit..

[22]  Weixin Xie,et al.  An Efficient Global K-means Clustering Algorithm , 2011, J. Comput..

[23]  Meena Mahajan,et al.  The planar k-means problem is NP-hard , 2012, Theor. Comput. Sci..

[24]  Hans-Hermann Bock,et al.  Clustering and Neural Networks , 1998 .

[25]  Pierre Hansen,et al.  An Interior Point Algorithm for Minimum Sum-of-Squares Clustering , 1997, SIAM J. Sci. Comput..

[26]  Le Thi Hoai An,et al.  A new efficient algorithm based on DC programming and DCA for clustering , 2007, J. Glob. Optim..

[27]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[28]  T. P. Dinh,et al.  Convex analysis approach to d.c. programming: Theory, Algorithm and Applications , 1997 .

[29]  Jiming Peng,et al.  A Cutting Algorithm for the Minimum Sum-of-Squared Error Clustering , 2005, SDM.

[30]  Samir Elhedhli,et al.  Nondifferentiable Optimization , 2009, Encyclopedia of Optimization.

[31]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[32]  Hanghang Tong,et al.  Big data classification , 2014 .

[33]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[34]  D. Edwards Data Mining: Concepts, Models, Methods, and Algorithms , 2003 .

[35]  Nguyen Dong Yen,et al.  Quadratic Programming and Affine Variational Inequalities: A Qualitative Study , 2005 .