FIREFLYCLUST: AN AUTOMATED HIERARCHICAL TEXT CLUSTERING APPROACH

Text clustering is one of the text mining tasks that is employed in search engines. Discovering the optimal number of clusters for a dataset or repository is a challenging problem. Various clustering algorithms have been reported in the literature but most of them rely on a pre-defined value of the k clusters. In this study, a variant of Firefly algorithm, termed as FireflyClust, is proposed to automatically cluster text documents in a hierarchical manner. The proposed clustering method operates based on five phases: data pre-processing, clustering, item re-location, cluster selection and cluster refinement. Experiments are undertaken based on different selections of threshold value. Results on the TREC collection named TR11, TR12, TR23 and TR45, showed that the FireflyClust is a better approach than the Bisect K-means, hybrid Bisect K-means and Practical General Stochastic Clustering Method. Such a result would enlighten the directions in developing a better information retrieval engine for this dynamic and fast growing big data era.

[1]  Kai Ming Ting,et al.  A general stochastic clustering method for automatic cluster discovery , 2011, Pattern Recognit..

[2]  Swee Chuan Tan Simplifying and improving swarm-based clustering , 2012, 2012 IEEE Congress on Evolutionary Computation.

[3]  Reynaldo Gil-García,et al.  Dynamic hierarchical algorithms for document clustering , 2010, Pattern Recognit. Lett..

[4]  Leandro dos Santos Coelho,et al.  A chaotic firefly algorithm applied to reliability-redundancy optimization , 2011, 2011 IEEE Congress of Evolutionary Computation (CEC).

[5]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[6]  Thomas E. Potok,et al.  Document clustering using particle swarm optimization , 2005, Proceedings 2005 IEEE Swarm Intelligence Symposium, 2005. SIS 2005..

[7]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[8]  Moe Moe Zaw,et al.  Web Document Clustering Using Cuckoo Search Clustering Algorithm based on Levy Flight , 2013 .

[9]  Mohamed S. Kamel,et al.  Enhanced bisecting k-means clustering using intermediate cooperation , 2009, Pattern Recognit..

[10]  K. Murugesan,et al.  Hybrid Bisect K-Means Clustering Algorithm , 2011, 2011 International Conference on Business Computing and Global Informatization.

[11]  Husniza Husni,et al.  Integrated bisect K-means and firefly algorithm for hierarchical text clustering , 2016 .

[12]  Hema Banati,et al.  Performance analysis of firefly algorithm for data clustering , 2013 .

[13]  Fabien Picarougne,et al.  A New Approach of Data Clustering Using a Flock of Agents , 2007, Evolutionary Computation.

[14]  K. Murugesan,et al.  HYBRID HIERARCHICAL CLUSTERING: AN EXPERIMENTAL ANALYSIS , 2011 .

[15]  Ramiz M. Aliguliyev,et al.  Clustering of document collection - A weighting approach , 2009, Expert Syst. Appl..

[16]  Marco Dorigo,et al.  Swarm intelligence: from natural to artificial systems , 1999 .

[17]  Ming-Huwi Horng,et al.  Multilevel Image Thresholding Selection Based on the Firefly Algorithm , 2010, 2010 7th International Conference on Ubiquitous Intelligence & Computing and 7th International Conference on Autonomic & Trusted Computing.

[18]  V. Mani,et al.  Clustering using firefly algorithm: Performance study , 2011, Swarm Evol. Comput..

[19]  Amir Hossein Gandomi,et al.  Firefly Algorithm for solving non-convex economic dispatch problems with valve loading effect , 2012, Appl. Soft Comput..

[20]  OzturkCelal,et al.  A novel clustering approach , 2011 .

[21]  Husniza Husni,et al.  Discovering optimal clusters using firefly algorithm , 2016, Int. J. Data Min. Model. Manag..

[22]  Mario Kusek,et al.  A self-optimizing mobile network: Auto-tuning the network with firefly-synchronized agents , 2012, Inf. Sci..

[23]  Xiaohua Hu,et al.  Towards effective document clustering: A constrained K-means based approach , 2008, Inf. Process. Manag..

[24]  Soon Myoung Chung,et al.  Text document clustering based on neighbors , 2009, Data Knowl. Eng..

[25]  Ramiz M. Aliguliyev,et al.  Performance evaluation of density-based clustering methods , 2009, Inf. Sci..

[26]  Simon Fong,et al.  Nature-Inspired Clustering Algorithms for Web Intelligence Data , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[27]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[28]  Mohamed S. Kamel,et al.  Cooperative clustering , 2010, Pattern Recognit..

[29]  K. Faez,et al.  A speech recognition system based on Structure Equivalent Fuzzy Neural Network trained by Firefly algorithm , 2012, 2012 International Conference on Biomedical Engineering (ICoBE).

[30]  Husniza Husni,et al.  Document Clustering Based on Firefly Algorithm , 2015, J. Comput. Sci..

[31]  Amir-Masoud Eftekhari-Moghadam,et al.  An image segmentation approach based on maximum variance Intra-cluster method and Firefly algorithm , 2011, 2011 Seventh International Conference on Natural Computation.

[32]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[33]  Theofanis Apostolopoulos,et al.  Application of the Firefly Algorithm for Solving the Economic Emissions Load Dispatch Problem , 2011 .

[34]  Jean-Louis Deneubourg,et al.  The dynamics of collective sorting robot-like ants and ant-like robots , 1991 .