Basic firefly algorithm for document clustering

The Document clustering plays significant role in Information Retrieval (IR) where it organizes documents prior to the retrieval process. To date, various clustering algorithms have been proposed and this includes the K-means and Particle Swarm Optimization. Even though these algorithms have been widely applied in many disciplines due to its simplicity, such an approach tends to be trapped in a local minimum during its search for an optimal solution. To address the shortcoming, this paper proposes a Basic Firefly (Basic FA) algorithm to cluster text documents. The algorithm employs the Average Distance to Document Centroid (ADDC) as the objective function of the search. Experiments utilizing the proposed algorithm were conducted on the 20Newsgroups benchmark dataset. Results demonstrate that the Basic FA generates a more robust and compact clusters than the ones produced by K-means and Particle Swarm Optimization (PSO).

[1]  P. Brucker On the Complexity of Clustering Problems , 1978 .

[2]  Thomas E. Potok,et al.  Document clustering using particle swarm optimization , 2005, Proceedings 2005 IEEE Swarm Intelligence Symposium, 2005. SIS 2005..

[3]  Xiaohua Hu,et al.  Towards effective document clustering: A constrained K-means based approach , 2008, Inf. Process. Manag..

[4]  Soon Myoung Chung,et al.  Text document clustering based on neighbors , 2009, Data Knowl. Eng..

[5]  Ramiz M. Aliguliyev,et al.  Clustering of document collection - A weighting approach , 2009, Expert Syst. Appl..

[6]  Xin-She Yang,et al.  Firefly algorithm, stochastic test functions and design optimisation , 2010, Int. J. Bio Inspired Comput..

[7]  Moe Moe Zaw,et al.  Web Document Clustering Using Cuckoo Search Clustering Algorithm based on Levy Flight , 2013 .

[8]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[9]  Sherin M. Youssef,et al.  A New Hybrid Evolutionary-Based Data Clustering Using Fuzzy Particle Swarm Optimization , 2011, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence.

[10]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[11]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[12]  Mohamed S. Kamel,et al.  Enhanced bisecting k-means clustering using intermediate cooperation , 2009, Pattern Recognit..

[13]  Siu Cheung Hui,et al.  A Novel Ant-Based Clustering Approach for Document Clustering , 2006, AIRS.

[14]  H. Modares,et al.  Combining PSO and k-means to enhance data clustering , 2008, 2008 International Symposium on Telecommunications.

[15]  Dervis Karaboga,et al.  A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm , 2007, J. Glob. Optim..

[16]  Simon Fong,et al.  Integrating nature-inspired optimization algorithms to K-means clustering , 2012, Seventh International Conference on Digital Information Management (ICDIM 2012).

[17]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[18]  V. Mani,et al.  Clustering using firefly algorithm: Performance study , 2011, Swarm Evol. Comput..

[19]  Tieli Sun,et al.  An efficient hybrid data clustering method based on K-harmonic means and Particle Swarm Optimization , 2009, Expert Syst. Appl..

[20]  Amit Konar,et al.  Metaheuristic Clustering , 2009, Studies in Computational Intelligence.

[21]  Hakim Hacid,et al.  Exploring Validity Indices for Clustering Textual Data , 2009, Mining Complex Data.

[22]  Xin-She Yang,et al.  Firefly Algorithm: Recent Advances and Applications , 2013, ArXiv.

[23]  Hema Banati,et al.  Performance analysis of firefly algorithm for data clustering , 2013 .

[24]  Marco Dorigo,et al.  Swarm intelligence: from natural to artificial systems , 1999 .

[25]  Janez Brest,et al.  A comprehensive review of firefly algorithms , 2013, Swarm Evol. Comput..

[26]  Mohammad Reza Meybodi,et al.  Efficient stochastic algorithms for document clustering , 2013, Inf. Sci..