Efficiently finding the optimum number of clusters in a dataset with a new hybrid differential evolution algorithm: DELA

Clustering algorithms, a fundamental base for data mining procedures and learning techniques, suffer from the lack of efficient methods for determining the optimal number of clusters to be found in an arbitrary dataset. The few methods existing in the literature always use some sort of evolutionary algorithm having a cluster validation index as its objective function. In this article, a new evolutionary algorithm, based on a hybrid model of global and local heuristic search, is proposed for the same task, and some experimentation is done with different datasets and indexes. Due to its design, independent of any clustering procedure, it is applicable to virtually any clustering method like the widely used $$k$$k-means algorithm. Moreover, the use of non-parametric statistical tests over the experimental results, clearly show the proposed algorithm to be more efficient than other evolutionary algorithms currently used for the same task.

[1]  Konstantinos E. Parsopoulos,et al.  Cooperative micro-differential evolution for high-dimensional problems , 2009, GECCO.

[2]  Jon Atli Benediktsson,et al.  Unsupervised methods for the classification of hyperspectral images with low spatial resolution , 2013, Pattern Recognit..

[3]  Carsten Witt,et al.  Population size versus runtime of a simple evolutionary algorithm , 2008, Theor. Comput. Sci..

[4]  Ujjwal Maulik,et al.  A new Differential Evolution based Fuzzy Clustering for Automatic Cluster Evolution , 2009, 2009 IEEE International Advance Computing Conference.

[5]  Mario Cortina-Borja,et al.  Handbook of Parametric and Nonparametric Statistical Procedures, 5th edn , 2012 .

[6]  Charles F. Manski Analog Estimation Methods in Econometrics: Chapman & Hall/CRC Monographs on Statistics & Applied Probability , 1988 .

[7]  Yuchou Chang,et al.  Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm , 2008, Pattern Recognit..

[8]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[9]  M. Bellis,et al.  Using clustering techniques to identify localities with multiple health and social needs. , 2012, Health & place.

[10]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[11]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Jean-Bernard Rault,et al.  OFDM for digital TV broadcasting , 1994, Signal Process..

[13]  Xiaoyi Jiang,et al.  Image Segmentation Fusion Using General Ensemble Clustering Methods , 2010, ACCV.

[14]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Junjie Wu,et al.  Towards information-theoretic K-means clustering for image indexing , 2013, Signal Process..

[16]  Zbigniew Michalewicz,et al.  GAVaPS-a genetic algorithm with varying population size , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[17]  Ujjwal Maulik,et al.  An evolutionary technique based on K-Means algorithm for optimal clustering in RN , 2002, Inf. Sci..

[18]  Yi Lu,et al.  FGKA: a Fast Genetic K-means Clustering Algorithm , 2004, SAC '04.

[19]  Luis Enrique Sucar,et al.  A Bayesian approach for object classification based on clusters of SIFT local features , 2012, Expert Syst. Appl..

[20]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[21]  Swagatam Das,et al.  Automatic Clustering Using an Improved Differential Evolution Algorithm , 2007 .

[22]  Wojciech Kwedlo,et al.  A clustering method combining differential evolution with the K-means algorithm , 2011, Pattern Recognit. Lett..

[23]  Wei-Ping Lee,et al.  Automatic Clustering with Differential Evolution Using Cluster Number Oscillation Method , 2010, 2010 2nd International Workshop on Intelligent Systems and Applications.

[24]  Catherine A. Sugar,et al.  Finding the Number of Clusters in a Dataset , 2003 .

[25]  Ujjwal Maulik,et al.  Genetic clustering for automatic evolution of clusters and application to image classification , 2002, Pattern Recognit..

[26]  Zhang Yi,et al.  SCALE: a scalable framework for efficiently clustering transactional data , 2009, Data Mining and Knowledge Discovery.

[27]  Bassem Jarboui,et al.  Combinatorial particle swarm optimization (CPSO) for partitional clustering problem , 2007, Appl. Math. Comput..

[28]  Francisco Herrera,et al.  A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the CEC’2005 Special Session on Real Parameter Optimization , 2009, J. Heuristics.

[29]  Andries Petrus Engelbrecht,et al.  Particle swarm optimization method for image clustering , 2005, Int. J. Pattern Recognit. Artif. Intell..

[30]  Parag M. Kanade,et al.  Fuzzy ants as a clustering concept , 2003, 22nd International Conference of the North American Fuzzy Information Processing Society, NAFIPS 2003.

[31]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Jay Lee,et al.  A hybrid feature selection scheme for unsupervised learning and its application in bearing fault diagnosis , 2011, Expert Syst. Appl..