A Study of Bio-inspired Algorithm to Data Clustering using Different Distance Measures

Data mining is the process of extracting previously unknown and valid information from large databases. Clustering is an important data analysis and data mining method. It is the unsupervised classification of objects into clusters such that the objects from same cluster are similar and objects from different clusters are dissimilar. Data clustering is a difficult unsupervised learning problem because many factors such as distance measures, criterion functions, and initial conditions have come into play. Many algorithms have been proposed in literature. However, some traditional algorithms have drawbacks such as sensitive to initialization and easily trapped in local optima. Recently, bio-inspired algorithms such as ant colony algorithms (ACO) and particle swarm optimization algorithms (PSO) have found success in solving clustering problems. These algorithms have also been used in several other real-life applications. They are global optimization techniques. The distance based algorithms have been studied for the clustering problems. This paper provides a study of particle swarm optimization algorithm to data clustering using different distance measures including Euclidean, Manhattan and Chebyshev for well known real-life benchmark medical data sets and an artificially generated data set. The PSO-based clustering algorithm using Chebyshev distance measure is better fitness value than those of Euclidean and Manhattan distance measures.

[1]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[2]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[3]  R. Eberhart,et al.  Empirical study of particle swarm optimization , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[4]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[5]  Yue Shi,et al.  A modified particle swarm optimizer , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[6]  Dilson Lucas Pereira,et al.  Study of different approach to clustering data by using the Particle Swarm Optimization Algorithm , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[7]  Andries P. Engelbrecht,et al.  Image Classification using Particle Swarm Optimization , 2002, SEAL.

[8]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[9]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[10]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[11]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[12]  Barbara Webb,et al.  Swarm Intelligence: From Natural to Artificial Systems , 2002, Connect. Sci..

[13]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[14]  Tiago Ferra de Sousa,et al.  Swarm optimisation as a new tool for data mining , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[15]  Jonathan Timmis,et al.  Artificial Immune Systems: A New Computational Intelligence Approach , 2003 .

[16]  R. Sokal Clustering and Classification: Background and Current Directions , 1977 .

[17]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[18]  Zhang Yi,et al.  An Improved Hybrid Genetic Clustering Algorithm , 2006, SETN.

[19]  Khaled S. Al-Sultan,et al.  A Tabu search approach to the clustering problem , 1995, Pattern Recognit..

[20]  Dantong Ouyang,et al.  An artificial bee colony approach for clustering , 2010, Expert Syst. Appl..

[21]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[22]  Paul W. Mielke Geometric concerns pertaining to applications of statistical tests in the atmospheric sciences , 1985 .

[23]  Michael J. A. Berry,et al.  Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management , 2004 .

[24]  Michel Gendreau,et al.  An Introduction to Tabu Search , 2003, Handbook of Metaheuristics.

[25]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[26]  Marco Furini,et al.  International Journal of Computer and Applications , 2010 .

[27]  Andries Petrus Engelbrecht,et al.  Data clustering using particle swarm optimization , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[28]  Shi Zhongzhi,et al.  A clustering algorithm based on swarm intelligence , 2001, 2001 International Conferences on Info-Tech and Info-Net. Proceedings (Cat. No.01EX479).

[29]  Theofanis Apostolopoulos,et al.  Application of the Firefly Algorithm for Solving the Economic Emissions Load Dispatch Problem , 2011 .

[30]  Zülal Güngör,et al.  K-harmonic means data clustering with simulated annealing heuristic , 2007, Appl. Math. Comput..

[31]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[32]  James R. Schott,et al.  Principles of Multivariate Analysis: A User's Perspective , 2002 .

[33]  Chris H. Q. Ding,et al.  Cluster merging and splitting in hierarchical clustering algorithms , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[34]  Johannes Gehrke,et al.  CACTUS—clustering categorical data using summaries , 1999, KDD '99.

[35]  Mesut Gündüz,et al.  The analysis of discrete artificial bee colony algorithm with neighborhood operator on traveling salesman problem , 2012, Neural Computing and Applications.

[36]  Gillian M. Mimmack,et al.  Choice of Distance Matrices in Cluster Analysis: Defining Regions , 2001 .

[37]  Francesco Masulli,et al.  Soft Computing Applications , 2003 .

[38]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[39]  Sandra Paterlini,et al.  Evolutionary Approaches for Cluster Analysis , 2003 .

[40]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[41]  Jianchao Zeng,et al.  Using Particle Swarm Optimization and Genetic Programming to Evolve Classification Rules , 2006, 2006 6th World Congress on Intelligent Control and Automation.

[42]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[43]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[44]  Leon G. Higley,et al.  Forensic Entomology: An Introduction , 2009 .

[45]  Manuel López-Ibáñez,et al.  Ant colony optimization , 2010, GECCO '10.

[46]  Emanuel Falkenauer,et al.  Genetic Algorithms and Grouping Problems , 1998 .