Particle swarm Optimized Density-based Clustering and Classification: Supervised and unsupervised learning approaches

Abstract Two pattern recognition technologies in the field of machine learning, clustering and classification, have been applied in many domains. Density-based clustering is an essential clustering algorithm. The best known density-based clustering method is Density-Based Spatial Clustering of Applications with Noise (DBSCAN), which can find arbitrary shaped clusters in datasets. DBSCAN has three drawbacks: firstly, the parameters for DBSCAN are hard to set; secondly, the number of clusters cannot be controlled by the users; and thirdly, DBSCAN cannot directly be used as a classifier. In this paper a novel Particle swarm Optimized Density-based Clustering and Classification (PODCC) is proposed, designed to offset the drawbacks of DBSCAN. Particle Swarm Optimization (PSO), a widely used Evolutionary and Swarm Algorithm (ESA), has been applied in optimization problems in different research domains including data analytics. In PODCC, a variant of PSO, SPSO-2011, is used to search the parameter space so as to identify the best parameters for density-based clustering and classification. PODCC can function in terms of both Supervised and Unsupervised Learnings by applying the appropriate fitness functions proposed in this paper. With the proposed fitness function, users can set the number of clusters as input for PODCC. The proposed method was evaluated by testing ten synthetic datasets and ten benchmarking datasets selected from various open sources. The experimental results indicate that the proposed PODCC can perform better than some established methods, especially with respect to imbalanced datasets.

[1]  Donald C. Wunsch,et al.  Clustering with differential evolution particle swarm optimization , 2010, IEEE Congress on Evolutionary Computation.

[2]  P. Viswanath,et al.  Rough-DBSCAN: A fast hybrid density based clustering method for large data sets , 2009, Pattern Recognit. Lett..

[3]  Mauricio Zambrano-Bigiarini,et al.  Standard Particle Swarm Optimisation 2011 at CEC-2013: A baseline for future PSO improvements , 2013, 2013 IEEE Congress on Evolutionary Computation.

[4]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Nicola Torelli,et al.  Training and assessing classification rules with imbalanced data , 2012, Data Mining and Knowledge Discovery.

[6]  Kevin Baker,et al.  Classification of radar returns from the ionosphere using neural networks , 1989 .

[7]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[8]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[9]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[10]  Luca Scrucca,et al.  On Some Extensions to GA Package: Hybrid Optimisation, Parallelisation and Islands EvolutionOn some extensions to GA package: hybrid optimisation, parallelisation and islands evolution , 2016, R J..

[11]  Nicola Torelli,et al.  ROSE: a Package for Binary Imbalanced Learning , 2014, R J..

[12]  Jafar Habibi,et al.  A data mining approach for diagnosis of coronary artery disease , 2013, Comput. Methods Programs Biomed..

[13]  Jafar Habibi,et al.  Coronary artery disease detection using computational intelligence methods , 2016, Knowl. Based Syst..

[14]  Thomas Bäck,et al.  Evolutionary computation: Toward a new philosophy of machine intelligence , 1997, Complex..

[15]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[16]  M Congedo,et al.  A review of classification algorithms for EEG-based brain–computer interfaces , 2007, Journal of neural engineering.

[17]  S. Bandyopadhyay,et al.  Nonparametric genetic clustering: comparison of validity indices , 2001, IEEE Trans. Syst. Man Cybern. Syst..

[18]  Roohallah Alizadehsani,et al.  Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm , 2017, Comput. Methods Programs Biomed..

[19]  Joshua D. Knowles,et al.  Improvements to the scalability of multiobjective clustering , 2005, 2005 IEEE Congress on Evolutionary Computation.

[20]  Rafael Sachetto Oliveira,et al.  G-DBSCAN: A GPU Accelerated Algorithm for Density-based Clustering , 2013, ICCS.

[21]  Chin-Chen Chang,et al.  A New Density-Based Scheme for Clustering Based on Genetic Algorithm , 2005, Fundam. Informaticae.

[22]  Matteo Dell'Amico,et al.  NG-DBSCAN: Scalable Density-Based Clustering for Arbitrary Data , 2016, Proc. VLDB Endow..

[23]  Taher Niknam,et al.  An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis , 2010, Appl. Soft Comput..

[24]  Michal Daszykowski,et al.  Revised DBSCAN algorithm to cluster data with dense adjacent clusters , 2013 .

[25]  Witold Pedrycz,et al.  A comparative study of improved GA and PSO in solving multiple traveling salesmen problem , 2018, Appl. Soft Comput..

[26]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[27]  Klaus Nordhausen,et al.  An Introduction to Statistical Learning—with Applications in R by Gareth James, Daniela Witten, Trevor Hastie & Robert Tibshirani , 2014 .

[28]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[29]  Andries Petrus Engelbrecht,et al.  Data clustering using particle swarm optimization , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[30]  Luca Scrucca,et al.  GA: A Package for Genetic Algorithms in R , 2013 .

[31]  Jing Li,et al.  A new hybrid method based on partitioning-based DBSCAN and ant clustering , 2011, Expert Syst. Appl..

[32]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[33]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[34]  Di Ma,et al.  MR-DBSCAN: An Efficient Parallel Density-Based Clustering Algorithm Using MapReduce , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[35]  Marco Dorigo,et al.  Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[36]  Donald C. Wunsch,et al.  A Comparison Study of Validity Indices on Swarm-Intelligence-Based Clustering , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[37]  Thomas E. Potok,et al.  Document clustering using particle swarm optimization , 2005, Proceedings 2005 IEEE Swarm Intelligence Symposium, 2005. SIS 2005..

[38]  Dervis Karaboga,et al.  AN IDEA BASED ON HONEY BEE SWARM FOR NUMERICAL OPTIMIZATION , 2005 .

[39]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[40]  O. Weck,et al.  A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND THE GENETIC ALGORITHM , 2005 .

[41]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[42]  Wei-keng Liao,et al.  A new scalable parallel DBSCAN algorithm using the disjoint-set data structure , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[43]  Ronnie Johansson,et al.  Choosing DBSCAN Parameters Automatically using Differential Evolution , 2014 .

[44]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[45]  P. Viswanath,et al.  l-DBSCAN : A Fast Hybrid Density Based Clustering Method , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[46]  Benxiong Huang,et al.  Internet Traffic Classification Using DBSCAN , 2009, 2009 WASE International Conference on Information Engineering.

[47]  Ching-Yi Chen,et al.  Particle swarm optimization algorithm and its application to clustering analysis , 2004, 2012 Proceedings of 17th Conference on Electrical Power Distribution.

[48]  Slava Kisilevich,et al.  P-DBSCAN: a density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos , 2010, COM.Geo '10.