Text document classification using swarm intelligence

This paper presents an algorithm for the automatic grouping of PDF documents, and with potential application for Web document classification. The algorithm developed is based on an ant-clustering algorithm, which was inspired by the behavior of some ant species in the organization their nests. To apply the ant-clustering algorithm for text document classification, two modifications had to be introduced in the standard algorithm: 1) the use of a metric to evaluate the similarity degree of text data, instead of numeric data; and 2) the proposal of a cooling schedule for a user-defined parameter so as to improve the convergence properties of the algorithm. To illustrate the behavior of the modified algorithm, it was applied to sets of real-world documents taken from the IEEE WCCI -1998 CD.

[1]  Weng-Kin Lai,et al.  Homogeneous Ants for Web Document Similarity Modeling and Categorization , 2002, Ant Algorithms.

[2]  Baldo Faieta,et al.  Exploratory database analysis via self-organization , 1994 .

[3]  Gerard Salton,et al.  The SMART and SIRE experimental retrieval systems , 1997 .

[4]  G. Beni,et al.  The concept of cellular robotic system , 1988, Proceedings IEEE International Symposium on Intelligent Control 1988.

[5]  D. Snyers,et al.  New results on an ant-based heuristic for highlighting the organization of large graphs , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[6]  Juan Julián Merelo Guervós,et al.  Self-Organized Stigmergic Document Maps: Environment as a Mechanism for Context Learning , 2004, ArXiv.

[7]  Jean-Louis Deneubourg,et al.  The dynamics of collective sorting robot-like ants and ant-like robots , 1991 .

[8]  C. Lee Giles,et al.  Feature Digital Libraries and Autonomous Citation Indexing , .

[9]  Julia Handl,et al.  Improved Ant-Based Clustering and Sorting , 2002, PPSN.

[10]  Leandro Nunes de Castro,et al.  The Influence of Pheromone and Adaptive Vision in the Standard Ant Clustering Algorithm , 2005 .

[11]  C. Lee Giles,et al.  Digital Libraries and Autonomous Citation Indexing , 1999, Computer.

[12]  Samuel Kaski,et al.  Keyword selection method for characterizing text document maps , 1999 .

[13]  Mauro Birattari,et al.  Swarm Intelligence , 2012, Lecture Notes in Computer Science.

[14]  Leandro Nunes de Castro,et al.  Recent Developments In Biologically Inspired Computing , 2004 .

[15]  Pascale Kuntz,et al.  Emergent colonization and graph partitioning , 1994 .

[16]  Nicolas Monmarché,et al.  On Improving Clustering in Numerical Databases with Artificial Ants , 1999, ECAL.

[17]  Pascale Kuntz,et al.  A Stochastic Heuristic for Visualising Graph Clusters in a Bi-Dimensional Space Prior to Partitioning , 1999, J. Heuristics.

[18]  Baldo Faieta,et al.  Diversity and adaptation in populations of clustering ants , 1994 .

[19]  Marco Dorigo,et al.  Swarm intelligence: from natural to artificial systems , 1999 .

[20]  E. Bonabeau From classical models of morphogenesis to agent-based models of pattern formation , 1997 .

[21]  C. Lee Giles,et al.  Indexing and retrieval of scientific literature , 1999, CIKM '99.