Solving text clustering problem using a memetic differential evolution algorithm

The text clustering is considered as one of the most effective text document analysis methods, which is applied to cluster documents as a consequence of the expanded big data and online information. Based on the review of the related work of the text clustering algorithms, these algorithms achieved reasonable clustering results for some datasets, while they failed on a wide variety of benchmark datasets. Furthermore, the performance of these algorithms was not robust due to the inefficient balance between the exploitation and exploration capabilities of the clustering algorithm. Accordingly, this research proposes a Memetic Differential Evolution algorithm (MDETC) to solve the text clustering problem, which aims to address the effect of the hybridization between the differential evolution (DE) mutation strategy with the memetic algorithm (MA). This hybridization intends to enhance the quality of text clustering and improve the exploitation and exploration capabilities of the algorithm. Our experimental results based on six standard text clustering benchmark datasets (i.e. the Laboratory of Computational Intelligence (LABIC)) have shown that the MDETC algorithm outperformed other compared clustering algorithms based on AUC metric, F-measure, and the statistical analysis. Furthermore, the MDETC is compared with the state of art text clustering algorithms and obtained almost the best results for the standard benchmark datasets.

[1]  Husniza Husni,et al.  GF-CLUST: A nature-inspired algorithm for automatic text clustering , 2016 .

[2]  Cesar H. Comin,et al.  Clustering algorithms: A comparative approach , 2016, PloS one.

[3]  Laith Mohammad Abualigah,et al.  Hybrid clustering analysis using improved krill herd algorithm , 2018, Applied Intelligence.

[4]  Moe Moe Zaw,et al.  Web Document Clustering by Using PSO-Based Cuckoo Search Clustering Algorithm , 2015, Recent Advances in Swarm Intelligence and Evolutionary Computation.

[5]  Nadjet Kamel,et al.  A New Algorithm for Data Clustering Based on Cuckoo Search Optimization , 2013, ICGEC.

[6]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[7]  Thomas E. Potok,et al.  Document clustering using particle swarm optimization , 2005, Proceedings 2005 IEEE Swarm Intelligence Symposium, 2005. SIS 2005..

[8]  Siang Yew Chong,et al.  Centroid-based memetic algorithm – adaptive Lamarckian and Baldwinian learning , 2012, Int. J. Syst. Sci..

[9]  Wei Song,et al.  Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures , 2009, Expert Syst. Appl..

[10]  Yoojin Chung,et al.  An Evolutionary Approach for Document Clustering , 2013 .

[11]  B. Lown,et al.  Measuring compassionate healthcare with the 12-item Schwartz Center Compassionate Care Scale , 2019, PloS one.

[12]  Karim Hammad,et al.  A memetic optimization algorithm for multi-constrained multicast routing in ad hoc networks , 2018, PloS one.

[13]  Charu C. Aggarwal,et al.  A Survey of Text Clustering Algorithms , 2012, Mining Text Data.

[14]  Pramod Kumar Singh,et al.  Chaotic gradient artificial bee colony for text clustering , 2016, Soft Comput..

[15]  K. Sörensen,et al.  Memetic algorithms with population management , 2006 .

[16]  Mohd Zakree Ahmad Nazri,et al.  An improved adaptive memetic differential evolution optimization algorithms for data clustering problems , 2019, PloS one.

[17]  E. Nagarajan,et al.  Document clustering using ant colony algorithm , 2017, 2017 International Conference on Big Data Analytics and Computational Intelligence (ICBDAC).

[18]  Ponnuthurai N. Suganthan,et al.  Recent advances in differential evolution - An updated survey , 2016, Swarm Evol. Comput..

[19]  Robert G. Reynolds,et al.  An Adaptive Multipopulation Differential Evolution With Dynamic Population Reduction , 2017, IEEE Transactions on Cybernetics.

[20]  Z. Ramadan,et al.  Disentangling factors that shape the gut microbiota in German Shepherd dogs , 2018, PloS one.

[21]  Mohammad Reza Meybodi,et al.  Efficient stochastic algorithms for document clustering , 2013, Inf. Sci..

[22]  Qingfu Zhang,et al.  Multiobjective evolutionary algorithms: A survey of the state of the art , 2011, Swarm Evol. Comput..

[23]  Pengzhu Zhang,et al.  Health-Related Hot Topic Detection in Online Communities Using Text Clustering , 2013, PloS one.

[24]  Mehrnoush Shamsfard,et al.  An improved bee colony optimization algorithm with an application to document clustering , 2015, Neurocomputing.

[25]  Mohammad Reza Meybodi,et al.  A new hybrid approach for data clustering using firefly algorithm and K-means , 2012, The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012).

[26]  Tai-Hsi Wu,et al.  A particle swarm optimization approach with refinement procedure for nurse rostering problem , 2015, Comput. Oper. Res..

[27]  David E. Goldberg,et al.  Genetic Algorithms, Tournament Selection, and the Effects of Noise , 1995, Complex Syst..

[28]  Laith Mohammad Abualigah,et al.  A new feature selection method to improve the document clustering using particle swarm optimization algorithm , 2017, J. Comput. Sci..

[29]  Dharmender Kumar,et al.  Optimization of Clustering Problem Using Population Based Artificial Bee Colony Algorithm: A Review , 2014 .

[30]  Jiayin Kang,et al.  Combination of Fuzzy C-Means and Particle Swarm Optimization for Text Document Clustering , 2012 .

[31]  Ville Tirronen,et al.  On memetic Differential Evolution frameworks: A study of advantages and limitations in hybridization , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[32]  Alan Bensoussan,et al.  Traditional Chinese Medicine in Cancer Care: A Review of Controlled Clinical Studies Published in Chinese , 2013, PloS one.

[33]  Abdolreza Hatamlou,et al.  An efficient hybrid clustering method based on improved cuckoo optimization and modified particle swarm optimization algorithms , 2018, Appl. Soft Comput..

[34]  A. Shanmugam,et al.  A Proficient Method for Text Clustering Using Harmony Search Method , 2015 .

[35]  P. Manikandan,et al.  Data Clustering Using Cuckoo Search Algorithm (CSA) , 2012, SocProS.

[36]  Hossam M. J. Mustafa,et al.  MULTI-OBJECTIVES MEMETIC DISCRETE DIFFERENTIAL EVOLUTION ALGORITHM FOR SOLVING THE CONTAINER PRE-MARSHALLING PROBLEM , 2018, Journal of Information and Communication Technology.

[37]  Mohammad Shokouhifar,et al.  Optimized sugeno fuzzy clustering algorithm for wireless sensor networks , 2017, Eng. Appl. Artif. Intell..

[38]  Pablo Moscato,et al.  Handbook of Memetic Algorithms , 2011, Studies in Computational Intelligence.

[39]  Masri Ayob,et al.  The effect of elite pool in hybrid population-based meta-heuristics for solving combinatorial optimization problems , 2016, Appl. Soft Comput..

[40]  Kusum Kumari Bharti,et al.  Chaotic Artificial Bee Colony for Text Clustering , 2014, 2014 Fourth International Conference of Emerging Applications of Information Technology.

[41]  Veenu Mangat,et al.  Evaluation of text document clustering approach based on particle swarm optimization , 2013, Central European Journal of Computer Science.

[42]  Wei Song,et al.  A hybrid evolutionary computation approach with its application for optimizing text document clustering , 2015, Expert Syst. Appl..

[43]  Ehsan Amiri,et al.  Efficient protocol for data clustering by fuzzy Cuckoo Optimization Algorithm , 2016, Appl. Soft Comput..

[44]  Nasser R. Sabar,et al.  An adaptive hybrid algorithm for vehicle routing problems with time windows , 2017, Comput. Ind. Eng..

[45]  Makoto Takizawa,et al.  A Survey on Clustering Algorithms for Wireless Sensor Networks , 2010, 2010 13th International Conference on Network-Based Information Systems.

[46]  Shashi Shekhar,et al.  Clustering and Information Retrieval , 2011, Network Theory and Applications.

[47]  Wei Song,et al.  Particle swarm optimization algorithm with environmental factors for clustering analysis , 2014, Soft Computing.

[48]  Kenneth Sörensen,et al.  MA mid PM: memetic algorithms with population management , 2006, Comput. Oper. Res..

[49]  Witold Pedrycz,et al.  An overview on the roles of fuzzy set techniques in big data processing: Trends, challenges and opportunities , 2017, Knowl. Based Syst..

[50]  Mohammed Azmi Al-Betar,et al.  Multi-objectives-based text clustering technique using K-mean algorithm , 2016, 2016 7th International Conference on Computer Science and Information Technology (CSIT).