How many clusters? A robust PSO-based local density model

While most clustering methods assume that the number of data clusters is known, automatically estimating the number of clusters by algorithm itself is still a challenging problem in the data clustering field. In this paper, we aim to develop a novel local and not differentiable clustering method based on Particle Swarm Optimization, which can estimate the number of clusters automatically. In particular, the proposed approach measures the local compactness of each cluster by local density function, pushes the PSO towards maximizing such a compactness, and penalizes the whole procedure to avoid estimating quite a lot of clusters during the evolution. The compactness modeling makes the clustering robust to outliers and noise. In addition, due to the merit of PSO, although kernel trick is used in our modeling, it does not consume too much memory when more and more data are processed. The evaluation on the synthetic dataset and the five publicly available datasets shows that our algorithm can estimate the appropriate number of clusters and outperforms six related state-of-the-art clustering methods that can also estimate the number of clusters.

[1]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  Chitra Dorai,et al.  COSMOS - A Representation Scheme for 3D Free-Form Objects , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  V. Rao Vemuri,et al.  Multiniche Crowding in Genetic Algorithms and Its Application to the Assembly of DNA Restriction-Fragments , 1994, Evolutionary Computation.

[4]  Andries Petrus Engelbrecht,et al.  Dynamic clustering using particle swarm optimization with application in image segmentation , 2006, Pattern Analysis and Applications.

[5]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[6]  Claudio De Stefano,et al.  Where Are the Niches? Dynamic Fitness Sharing , 2007, IEEE Transactions on Evolutionary Computation.

[7]  Serkan Kiranyaz,et al.  Multi-dimensional Particle Swarm Optimization for dynamic clustering , 2009, IEEE EUROCON 2009.

[8]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[9]  Weiguo Sheng,et al.  Multilocal Search and Adaptive Niching Based Memetic Algorithm With a Consensus Criterion for Data Clustering , 2014, IEEE Transactions on Evolutionary Computation.

[10]  Junbin Gao,et al.  Subspace Clustering for Sequential Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Zhuwen Li,et al.  SCAMS: Simultaneous Clustering and Model Selection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Jian-Huang Lai,et al.  Approximate kernel competitive learning , 2015, Neural Networks.

[13]  Bernhard Schölkopf,et al.  A Local Learning Approach for Clustering , 2006, NIPS.

[14]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[15]  Ponnuthurai N. Suganthan,et al.  A Distance-Based Locally Informed Particle Swarm Model for Multimodal Optimization , 2013, IEEE Transactions on Evolutionary Computation.

[16]  Andrei-Horia Mogos,et al.  A Kernel Based Clustering Algorithm using Particle Swarm Optimization , 2013 .

[17]  Haiqiao Huang,et al.  A robust adaptive clustering analysis method for automatic identification of clusters , 2012, Pattern Recognit..

[18]  Andries P. Engelbrecht,et al.  Dynamic Clustering using Particle Swarm Optimization with Application in Unsupervised Image Classification , 2007 .

[19]  Chang-Dong Wang,et al.  Position regularized Support Vector Domain Description , 2013, Pattern Recognit..

[20]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[21]  Daewon Lee,et al.  Dynamic Dissimilarity Measure for Support-Based Clustering , 2010, IEEE Transactions on Knowledge and Data Engineering.

[22]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[24]  Leandro N. de Castro,et al.  Data Clustering with Particle Swarms , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[25]  Huang Yongxuan,et al.  Fuzzy c-means Cluster Image Segmentation with Entropy Constraint , 2007, IECON 2007 - 33rd Annual Conference of the IEEE Industrial Electronics Society.

[26]  Swagatam Das,et al.  Automatic Clustering Using an Improved Differential Evolution Algorithm , 2007 .

[27]  Amreen Khan,et al.  An Analysis of Particle Swarm Optimization with Data Clustering-Technique for Optimization in Data Mining. , 2010 .

[28]  Yangyang Li,et al.  A particle swarm optimization based simultaneous learning framework for clustering and classification , 2014, Pattern Recognit..

[29]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[30]  Alain Pétrowski,et al.  A clearing procedure as a niching method for genetic algorithms , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[31]  Hong He,et al.  A two-stage genetic algorithm for automatic clustering , 2012, Neurocomputing.

[32]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[33]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[34]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[35]  Sanghamitra Bandyopadhyay,et al.  A Point Symmetry-Based Clustering Technique for Automatic Evolution of Clusters , 2008, IEEE Transactions on Knowledge and Data Engineering.

[36]  Gillian Dobbie,et al.  Research on particle swarm optimization based clustering: A systematic review of literature and techniques , 2014, Swarm Evol. Comput..

[37]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[38]  Chang-Dong Wang,et al.  SVStream: A Support Vector-Based Algorithm for Clustering Data Streams , 2013, IEEE Transactions on Knowledge and Data Engineering.

[39]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[40]  Rui Xu,et al.  Clustering Algorithms in Biomedical Research: A Review , 2010, IEEE Reviews in Biomedical Engineering.

[41]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[42]  Alex Smola,et al.  Kernel methods in machine learning , 2007, math/0701907.

[43]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[44]  Amit Konar,et al.  Automatic kernel clustering with a Multi-Elitist Particle Swarm Optimization Algorithm , 2008, Pattern Recognit. Lett..

[45]  Tao Guo,et al.  Adaptive Affinity Propagation Clustering , 2008, ArXiv.

[46]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..