A simulated annealing‐based maximum‐margin clustering algorithm

Maximum-margin clustering (MMC) is an extension of the support vector machine (SVM) to clustering. It partitions a set of unlabelled data into multiple groups by finding hyperplanes with the largest margins. Although existing algorithms have shown promising results, there is no guarantee of convergence of these algorithms to global solutions due to the non-convexity of the optimization problem. In this paper, we propose a simulated annealing-based algorithm that is able to address the issue of local minima in the MMC problem. The novelty of our algorithm is twofold: (1) it comprises a comprehensive cluster modification scheme based on simulated annealing, and (2) it introduces a new approach based on the combination of k-means++ and SVM at each step of the annealing process. More precisely, k-means++ is initially applied to extract subsets of the data points. Then, an unsupervised SVM is applied to improve the clustering results. Experimental results on various benchmark datasets (of up to over a million points) give evidence that the proposed algorithm is more effective at solving the clustering problem than a number of popular clustering algorithms.

[1]  Fei Wang,et al.  Unsupervised Maximum Margin Feature Selection with manifold regularization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[3]  Adil M. Bagirov,et al.  A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems , 2006, Eur. J. Oper. Res..

[4]  Ivor W. Tsang,et al.  Maximum Margin Clustering Made Practical , 2007, IEEE Transactions on Neural Networks.

[5]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[6]  Jiye Liang,et al.  Fast global k-means clustering based on local geometrical information , 2013, Inf. Sci..

[7]  M. Locatelli Simulated Annealing Algorithms for Continuous Global Optimization: Convergence Conditions , 2000 .

[8]  Rong Jin,et al.  Generalized Maximum Margin Clustering and Unsupervised Kernel Learning , 2006, NIPS.

[9]  J. Kogan Introduction to Clustering Large and High-Dimensional Data , 2007 .

[10]  Fei Wang,et al.  Maximum Margin Clustering on Data Manifolds , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[11]  Ivor W. Tsang,et al.  Tighter and Convex Maximum Margin Clustering , 2009, AISTATS.

[12]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[13]  Adil M. Bagirov,et al.  Fast modified global k-means algorithm for incremental cluster construction , 2011, Pattern Recognit..

[14]  Jim Z. C. Lai,et al.  Fast global k-means clustering using cluster membership and inequality , 2010, Pattern Recognit..

[15]  Peter A. Dowd,et al.  An enhanced stochastic optimization in fracture network modelling conditional on seismic events , 2014 .

[16]  Frank Klawonn,et al.  Fuzzy c-means in High Dimensional Spaces , 2011, Int. J. Fuzzy Syst. Appl..

[17]  Bela Gipp,et al.  Research-paper recommender systems: a literature survey , 2015, International Journal on Digital Libraries.

[18]  Adil M. Bagirov,et al.  A heuristic algorithm for solving the minimum sum-of-squares clustering problems , 2015, J. Glob. Optim..

[19]  Nenghai Yu,et al.  Maximum Margin Clustering with Pairwise Constraints , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[20]  Jagan Sankaranarayanan,et al.  Max-margin clustering: Detecting margins from projections of points on lines , 2011, CVPR 2011.

[21]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[22]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[23]  Shachar Lovett,et al.  Preface , 2012, COLT.

[24]  Carl E. Rasmussen,et al.  Dirichlet Process Gaussian Mixture Models: Choice of the Base Distribution , 2010, Journal of Computer Science and Technology.

[25]  Adil M. Bagirov,et al.  Modified global k-means algorithm for minimum sum-of-squares clustering problems , 2008, Pattern Recognit..

[26]  Dale Schuurmans,et al.  Unsupervised and Semi-Supervised Multi-Class Support Vector Machines , 2005, AAAI.

[27]  Fei Wang,et al.  Efficient multiclass maximum margin clustering , 2008, ICML '08.

[28]  Oliver Kramer,et al.  Fast evolutionary maximum margin clustering , 2009, ICML '09.