Swarm Intelligence Algorithms in Text Document Clustering with Various Benchmarks

Text document clustering refers to the unsupervised classification of textual documents into clusters based on content similarity and can be applied in applications such as search optimization and extracting hidden information from data generated by IoT sensors. Swarm intelligence (SI) algorithms use stochastic and heuristic principles that include simple and unintelligent individuals that follow some simple rules to accomplish very complex tasks. By mapping features of problems to parameters of SI algorithms, SI algorithms can achieve solutions in a flexible, robust, decentralized, and self-organized manner. Compared to traditional clustering algorithms, these solving mechanisms make swarm algorithms suitable for resolving complex document clustering problems. However, each SI algorithm shows a different performance based on its own strengths and weaknesses. In this paper, to find the best performing SI algorithm in text document clustering, we performed a comparative study for the PSO, bat, grey wolf optimization (GWO), and K-means algorithms using six data sets of various sizes, which were created from BBC Sport news and 20 newsgroups. Based on our experimental results, we discuss the features of a document clustering problem with the nature of SI algorithms and conclude that the PSO and GWO SI algorithms are better than K-means, and among those algorithms, the PSO performs best in terms of finding the optimal solution.

[1]  Dong Yang,et al.  A comparison analysis of swarm intelligence algorithms for robot swarm learning , 2017, 2017 Winter Simulation Conference (WSC).

[2]  Gaige Wang,et al.  A multi-swarm bat algorithm for global optimization , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[3]  Ying Tan,et al.  GPU-based Parallel Implementation of Swarm Intelligence Algorithms , 2016 .

[5]  Greg Hamerly,et al.  Accelerating Lloyd’s Algorithm for k -Means Clustering , 2015 .

[6]  Simon Fong,et al.  Comparative Research of Swam Intelligence Clustering Algorithms for Analyzing Medical Data , 2019, IEEE Access.

[7]  Alfredo Milani,et al.  A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation , 2019 .

[8]  Hanjun Kim,et al.  Offline-to-Online Service and Big Data Analysis for End-to-end Freight Management System , 2020 .

[9]  D. Steinley Properties of the Hubert-Arabie adjusted Rand index. , 2004, Psychological methods.

[10]  J. Jayakumari,et al.  Distributed document clustering analysis based on a hybrid method , 2017, China Communications.

[11]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[12]  G. Wiselin Jiji,et al.  A Survey on optimization approaches to text document clustering , 2013, ArXiv.

[13]  Fuzhong Nian,et al.  An Adaptive Particle Swarm Optimization Algorithm Based on Directed Weighted Complex Network , 2014 .

[14]  Eunmi Choi,et al.  Survey of Swarm Intelligence Algorithms , 2020, ICSIM.

[15]  M. Abdelguerfi,et al.  Introduction 1.2 Parallel Database Systems 1.2.1 Computation Model 2 1.2 Parallel Database Systems Introduction Select * from Employee, Department Where (employee.dept_no @bullet Department.dept_no) and (employee.position = "manager") (a) Sql Request 1.2.2 Engineering Model , 2022 .

[16]  Hugo Valadares Siqueira,et al.  Swarm intelligence for clustering - A systematic review with new perspectives on data mining , 2019, Eng. Appl. Artif. Intell..

[17]  S. J. Mohana,et al.  Comparative Analysis of Swarm Intelligence Optimization Techniques for Cloud Scheduling , 2014 .

[18]  Shima Sabet,et al.  A COMPARISON BETWEEN SWARM INTELLIGENCE ALGORITHMS FOR ROUTING PROBLEMS , 2018 .

[19]  Laith Mohammad Abualigah,et al.  A new feature selection method to improve the document clustering using particle swarm optimization algorithm , 2017, J. Comput. Sci..

[20]  Malik Braik,et al.  A Grey Wolf Optimizer for Text Document Clustering , 2018, J. Intell. Syst..

[21]  Fabio Caraffini,et al.  Cooperative and distributed decision-making in a multi-agent perception system for improvised land mines detection , 2020, Inf. Fusion.

[22]  Laith Mohammad Abualigah,et al.  Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering , 2017, The Journal of Supercomputing.

[23]  Medhat A. Tawfeek,et al.  A comparative study into swarm intelligence algorithms for dynamic tasks scheduling in cloud computing , 2015, 2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems (ICICIS).

[24]  Xin-She Yang,et al.  Bat algorithm: literature review and applications , 2013, Int. J. Bio Inspired Comput..

[25]  Shengrui Wang,et al.  Text Clustering via Particle Swarm Optimization , 2009, 2009 IEEE Swarm Intelligence Symposium.

[26]  Ahmad M. Khasawneh,et al.  Nature-Inspired Optimization Algorithms for Text Document Clustering - A Comprehensive Analysis , 2020, Algorithms.

[27]  Haoran Zhu,et al.  A Comparative Study of Swarm Intelligence Algorithms for UCAV Path-Planning Problems , 2021, Mathematics.

[28]  Mohammad Aizat bin Basir,et al.  Comparison on Swarm Algorithms for Feature Selections/Reductions , 2014 .

[29]  Thomas E. Potok,et al.  Document clustering using particle swarm optimization , 2005, Proceedings 2005 IEEE Swarm Intelligence Symposium, 2005. SIS 2005..

[30]  Vivek Kumar Singh,et al.  Document Clustering Using K-Means, Heuristic K-Means and Fuzzy C-Means , 2011, 2011 International Conference on Computational Intelligence and Communication Networks.

[31]  I. Guyon,et al.  Benchmarking in cluster analysis: A white paper , 2018, 1809.10496.

[32]  Veenu Mangat,et al.  Evaluation of text document clustering approach based on particle swarm optimization , 2013, Central European Journal of Computer Science.

[33]  Ibrahim A. Hameed,et al.  Grey wolf optimizer (GWO) for automated offshore crane design , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[34]  Fabio Caraffini,et al.  A Robust Decision-Making Framework Based on Collaborative Agents , 2020, IEEE Access.

[35]  Vili Podgorelec,et al.  Swarm Intelligence Algorithms for Feature Selection: A Review , 2018, Applied Sciences.

[36]  Athanasios V. Vasilakos,et al.  Data Mining for the Internet of Things: Literature Review and Challenges , 2015, Int. J. Distributed Sens. Networks.