Elephant search algorithm on data clustering

Data clustering is one of the most popular branches in machine learning and data analysis. Partitioning-based type of clustering algorithms, such as K-means, is prone to the problem of producing a set of clusters that is far from perfect due to its probabilistic nature. The clustering process starts with some random partitions at the beginning, and it tries to improve the partitions progressively. Different initial partitions can result in different final clusters. Trying through all the possible candidate clusters for the perfect result is too time consuming. Meta-heuristic algorithm aims to search for global optimum in high-dimensional problems. Meta-heuristic algorithm has been successfully implemented on data clustering problems seeking a near optimal solution in terms of quality of the resultant clusters. In this paper, a new metaheuristic search method called Elephant Search Algorithm (ESA) is proposed to integrate into K-means, forming a new data clustering algorithm, namely C-ESA. The advantage of ESA is its dual features of (i) evolutionary operations and (ii) balance of local intensification and global exploration. The results by C-ESA are compared with classical clustering algorithms including K-means, DBSCAN, and GMM-EM. C-ESA is shown to outperform the other algorithms in terms of clustering accuracy via a computer simulation.

[1]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[2]  Simon Fong,et al.  Wolf search algorithm with ephemeral memory , 2012, Seventh International Conference on Digital Information Management (ICDIM 2012).

[3]  Benmouiza Khalil,et al.  Density-based spatial clustering of application with noise algorithm for the classification of solar radiation time series , 2016, 2016 8th International Conference on Modelling, Identification and Control (ICMIC).

[4]  Y. J. Zhang,et al.  A survey on evaluation methods for image segmentation , 1996, Pattern Recognit..

[5]  Simon Fong,et al.  Elephant Search Algorithm for optimization problems , 2015, 2015 Tenth International Conference on Digital Information Management (ICDIM).

[6]  Xin-She Yang,et al.  Review of Metaheuristics and Generalized Evolutionary Walk Algorithm , 2011, 1105.3668.

[7]  Andries Petrus Engelbrecht,et al.  Data clustering using particle swarm optimization , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[8]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[9]  Xin-She Yang,et al.  Cuckoo Search via Lévy flights , 2009, 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC).

[10]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[11]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[12]  Simon Fong,et al.  Integrating nature-inspired optimization algorithms to K-means clustering , 2012, Seventh International Conference on Digital Information Management (ICDIM 2012).

[13]  Paul Scheunders,et al.  A genetic c-Means clustering algorithm applied to color image quantization , 1997, Pattern Recognit..

[14]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[15]  Z. Beheshti A review of population-based meta-heuristic algorithm , 2013, SOCO 2013.

[16]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[17]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[18]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[19]  Benjamin W. Wah,et al.  Significance and Challenges of Big Data Research , 2015, Big Data Res..

[20]  Ashish Sharma,et al.  An Enhanced Density Based Spatial Clustering of Applications with Noise , 2009, 2009 IEEE International Advance Computing Conference.

[21]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[22]  Daewon Lee,et al.  Dynamic Characterization of Cluster Structures for Robust and Inductive Support Vector Clustering , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Xin-She Yang,et al.  Firefly algorithm, stochastic test functions and design optimisation , 2010, Int. J. Bio Inspired Comput..