A General Framework for Mixed and Incomplete Data Clustering Based on Swarm Intelligence Algorithms

Swarm intelligence has appeared as an active field for solving numerous machine-learning tasks. In this paper, we address the problem of clustering data with missing values, where the patterns are described by mixed (or hybrid) features. We introduce a generic modification to three swarm intelligence algorithms (Artificial Bee Colony, Firefly Algorithm, and Novel Bat Algorithm). We experimentally obtain the adequate values of the parameters for these three modified algorithms, with the purpose of applying them in the clustering task. We also provide an unbiased comparison among several metaheuristics based clustering algorithms, concluding that the clusters obtained by our proposals are highly representative of the “natural structure” of data.

[1]  A. Acock Working With Missing Values , 2005 .

[2]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[3]  Francisco Herrera,et al.  A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms , 2011, Swarm Evol. Comput..

[4]  Lokesh Kumar Sharma,et al.  Genetic K-Means Clustering Algorithm for Mixed Numeric and Categorical Data Sets , 2010 .

[5]  Dervis Karaboga,et al.  A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm , 2007, J. Glob. Optim..

[6]  Zhizhong Zhang,et al.  Applications of Novel Hybrid Bat Algorithm With Constrained Pareto Fuzzy Dominant Rule on Multi-Objective Optimal Power Flow Problems , 2019, IEEE Access.

[7]  Xin-She Yang,et al.  Firefly Algorithms for Multimodal Optimization , 2009, SAGA.

[8]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[9]  Yu Liu,et al.  A novel bat algorithm with habitat selection and Doppler effect in echoes for optimization , 2015, Expert Syst. Appl..

[10]  Zhexue Huang,et al.  CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES , 1997 .

[11]  Zhang Ningning,et al.  An improved multi-view collaborative fuzzy C-means clustering algorithm and its application in overseas oil and gas exploration , 2020 .

[12]  Lipika Dey,et al.  A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets , 2011, Pattern Recognit. Lett..

[13]  Slawomir Zak,et al.  Firefly Algorithm for Continuous Constrained Optimization Tasks , 2009, ICCCI.

[14]  Xin-She Yang,et al.  Bat algorithm: literature review and applications , 2013, Int. J. Bio Inspired Comput..

[15]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Xin-She Yang,et al.  An improved discrete bat algorithm for symmetric and asymmetric Traveling Salesman Problems , 2016, Eng. Appl. Artif. Intell..

[17]  Veronica Oliveira de Carvalho,et al.  Combining K-Means and K-Harmonic with Fish School Search Algorithm for data clustering task on graphics processing units , 2016, Appl. Soft Comput..

[18]  J. Ruiz-Shulcloper,et al.  Pattern recognition with mixed and incomplete data , 2008, Pattern Recognition and Image Analysis.

[19]  Andrew Lewis,et al.  The Whale Optimization Algorithm , 2016, Adv. Eng. Softw..

[20]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[21]  Lipika Dey,et al.  A k-mean clustering algorithm for mixed numeric and categorical data , 2007, Data Knowl. Eng..

[22]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[23]  V. Mani,et al.  Clustering using firefly algorithm: Performance study , 2011, Swarm Evol. Comput..

[24]  A. Smoliński,et al.  The application of hierarchical clustering to analyzing ashes from the combustion of wood pellets mixed with waste materials. , 2021, Environmental pollution.

[25]  María José del Jesús,et al.  KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining , 2017, Int. J. Comput. Intell. Syst..

[26]  Olympia Roeva,et al.  Population Size Influence on the Genetic and Ant Algorithms Performance in Case of Cultivation Process Modeling , 2013, WCO@FedCSIS.

[27]  Dervis Karaboga,et al.  A novel clustering approach: Artificial Bee Colony (ABC) algorithm , 2011, Appl. Soft Comput..

[28]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[29]  Seyedali Mirjalili,et al.  Dragonfly algorithm: a new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems , 2015, Neural Computing and Applications.

[30]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[31]  Wilfrido Gómez-Flores,et al.  Automatic clustering using nature-inspired metaheuristics: A survey , 2016, Appl. Soft Comput..

[32]  Bin Li,et al.  Bio-inspired ant colony optimization based clustering algorithm with mobile sinks for applications in consumer home automation networks , 2015, IEEE Transactions on Consumer Electronics.

[33]  John W. Graham Missing Data Theory , 2012 .

[34]  Eréndira Rendón,et al.  Internal versus External cluster validation indexes , 2011 .

[35]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[36]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[37]  Dalia Streimikiene,et al.  Extreme point bias compensation: A similarity method of functional clustering and its application to the stock market , 2021, Expert Syst. Appl..