A novel adaptive density-based ACO algorithm with minimal encoding redundancy for clustering problems

In the so-called Big Data paradigm descriptive analytics are widely conceived as techniques and models aimed at discovering knowledge within unlabeled datasets (e.g. patterns, similarities, etc) of utmost help for subsequent predictive and prescriptive methods. One of these techniques is clustering, which hinges on different multi-dimensional measures of similarity between unsupervised data instances so as to blindly collect them in groups of clusters. Among the myriad of clustering approaches reported in the literature this manuscript focuses on those relying on bio-inspired meta-heuristics, which have been lately shown to outperform traditional clustering schemes in terms of convergence, adaptability and parallelization. Specifically this work presents a new clustering approach based on the processing fundamentals of the Ant Colony Optimization (ACO) algorithm, i.e. stigmergy via pheromone trails and progressive construction of solutions through a graph. The novelty of the proposed scheme beyond previous research on ACO-based clustering lies on a significantly pruned graph that not only minimizes the representation redundancy of the problem at hand, but also allows for an embedded estimation of the number of clusters within the data. However, this approach imposes a modified ant behavior so as to account for the optimality of entire paths rather than that of single steps within the graph. Simulation results over conventional datasets will evince the promising performance of our approach and motivate further research aimed at its applicability to real scenarios.

[1]  Gillian Dobbie,et al.  Research on particle swarm optimization based clustering: A systematic review of literature and techniques , 2014, Swarm Evol. Comput..

[2]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[4]  Joydeep Ghosh,et al.  A differential evolution algorithm to optimise the combination of classifier and cluster ensembles , 2015, Int. J. Bio Inspired Comput..

[5]  Ender Özcan,et al.  Linear Linkage Encoding in Grouping Problems: Applications on Graph Coloring and Timetabling , 2006, PATAT.

[6]  Irene Poli,et al.  Naïve Bayes Ant Colony Optimization for Experimental Design , 2012, SMPS.

[7]  Roberto Santana,et al.  Toward Understanding EDAs Based on Bayesian Networks Through a Quantitative Analysis , 2012, IEEE Transactions on Evolutionary Computation.

[8]  Gunnar Rätsch,et al.  Advanced Lectures on Machine Learning , 2004, Lecture Notes in Computer Science.

[9]  Mathew J. Palakal,et al.  A self organizing map-harmony search hybrid algorithm for clustering biological data , 2015, 2015 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES).

[10]  Antonio González-Pardo,et al.  A new CSP graph-based representation to resource-constrained project scheduling problem , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[11]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[12]  Christian Blum,et al.  Training feed-forward neural networks with ant colony optimization: an application to pattern classification , 2005, Fifth International Conference on Hybrid Intelligent Systems (HIS'05).

[13]  Kevin Cheng,et al.  An ACO-Based Clustering Algorithm , 2006, ANTS Workshop.

[14]  H. Mühlenbein,et al.  From Recombination of Genes to the Estimation of Distributions I. Binary Parameters , 1996, PPSN.

[15]  Mohammad Reza Meybodi,et al.  Hybridization of K-Means and Harmony Search Methods for Web Page Clustering , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[16]  Alex Alves Freitas,et al.  cAnt-Miner: An Ant Colony Classification Algorithm to Cope with Continuous Attributes , 2008, ANTS Conference.

[17]  Ujjwal Maulik,et al.  A Survey of Multiobjective Evolutionary Clustering , 2015, ACM Comput. Surv..

[18]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[19]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[20]  María José del Jesús,et al.  Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks , 2014, WIREs Data Mining Knowl. Discov..