Particle swarm optimisation representations for simultaneous clustering and feature selection

Clustering, the process of grouping unlabelled data, is an important task in data analysis. It is regarded as one of the most difficult tasks due to the large search space that must be explored. Feature selection is commonly used to reduce the size of a search space, and evolutionary computation (EC) is a group of techniques which are known to give good solutions to difficult problems such as clustering or feature selection. However, there has been relatively little work done on simultaneous clustering and feature selection using EC methods. In this paper we compare medoid and centroid representations that allow particle swarm optimisation (PSO) to perform simultaneous clustering and feature selection. We propose several new techniques which improve clustering performance and ensure valid solutions are generated. Experiments are conducted on a variety of real-world and synthetic datasets in order to analyse the effectiveness of the PSO representations across several different criteria. We show that a medoid representation can achieve superior results compared to the widely used centroid representation.

[1]  Huan Liu,et al.  Feature Selection: An Ever Evolving Frontier in Data Mining , 2010, FSDM.

[2]  Magdalene Marinaki,et al.  A Hybrid Particle Swarm Optimization Algorithm for Clustering Analysis , 2007, DaWaK.

[3]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[4]  Wilfrido Gómez-Flores,et al.  Automatic clustering using nature-inspired metaheuristics: A survey , 2016, Appl. Soft Comput..

[5]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[6]  Charu C. Aggarwal,et al.  Data Clustering , 2013 .

[7]  Ira Assent,et al.  Evaluating Clustering in Subspace Projections of High Dimensional Data , 2009, Proc. VLDB Endow..

[8]  Weiguo Sheng,et al.  A Niching Memetic Algorithm for Simultaneous Clustering and Feature Selection , 2008, IEEE Transactions on Knowledge and Data Engineering.

[9]  Xiujuan Lei,et al.  An improved projection pursuit clustering model and its application based on Quantum-behaved PSO , 2010, 2010 Sixth International Conference on Natural Computation.

[10]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[11]  Stan Matwin,et al.  A review on particle swarm optimization algorithm and its variants to clustering high-dimensional data , 2013, Artificial Intelligence Review.

[12]  Frans van den Bergh,et al.  An analysis of particle swarm optimizers , 2002 .

[13]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[14]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[15]  Tunchan Cura,et al.  A particle swarm optimization approach to clustering , 2012, Expert Syst. Appl..

[16]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[17]  Ganapati Panda,et al.  A survey on nature inspired metaheuristic algorithms for partitional clustering , 2014, Swarm Evol. Comput..

[18]  Emanuel Falkenauer,et al.  Genetic Algorithms and Grouping Problems , 1998 .

[19]  R. J. Kuo,et al.  Integration of particle swarm optimization and genetic algorithm for dynamic clustering , 2012, Inf. Sci..

[20]  Huan Liu,et al.  Feature Selection for Clustering: A Review , 2018, Data Clustering: Algorithms and Applications.

[21]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[22]  K. Faez,et al.  Clustering and feature selection via PSO algorithm , 2011, 2011 International Symposium on Artificial Intelligence and Signal Processing (AISP).

[23]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[24]  Magdalene Marinaki,et al.  A Hybrid Clustering Algorithm Based on Multi-swarm Constriction PSO and GRASP , 2008, DaWaK.