THE PROPOSAL OF TWO BIO-INSPIRED ALGORITHMS FOR TEXT CLUSTERING

The Internet can be seen as a major repository of resources and information. The growing demand for information, along with the large amount of data available, has been stimulating the research of methods for text mining. This work aims at using feature selection and text clustering techniques based on a Particle Swarm Clustering (PSC) algorithm and on an Artificial Neural Network modeled as a competitive and constructive Antibody Network, called RABNET (Real-valued Antibody Network), to show that both techniques present relevant results when applied to text clustering problems.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  Leandro N. de Castro,et al.  Data Clustering with Particle Swarms , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[3]  Victoria S. Uren,et al.  Immune-Inspired Adaptive Information Filtering , 2006, ICARIS.

[4]  Leandro Nunes de Castro,et al.  Fundamentals of Natural Computing - Basic Concepts, Algorithms, and Applications , 2006, Chapman and Hall / CRC computer and information science series.

[5]  V. Rao Vemuri,et al.  An artificial immune system approach to document clustering , 2005, SAC '05.

[6]  K. alik An efficient k'-means clustering algorithm , 2008 .

[7]  Jonathan Timmis,et al.  Artificial Immune Systems: A New Computational Intelligence Approach , 2003 .

[8]  David D. Lewis,et al.  Reuters-21578 Text Categorization Test Collection, Distribution 1.0 , 1997 .

[9]  Xiaoying Gao,et al.  Standardized evaluation method for Web clustering results , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[10]  Andreas Hotho,et al.  A Brief Survey of Text Mining , 2005, LDV Forum.

[11]  Ellen M. Voorhees,et al.  Overview of TREC 2001 , 2001, TREC.

[12]  Fernando José Von Zuben,et al.  An Immunological Filter for Spam , 2006, ICARIS.

[13]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[14]  Soumen Chakrabarti,et al.  Mining the web - discovering knowledge from hypertext data , 2002 .

[15]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[16]  Chaomei Chen,et al.  Mining the Web: Discovering knowledge from hypertext data , 2004, J. Assoc. Inf. Sci. Technol..

[17]  Leandro Nunes de Castro,et al.  Immune Cognition , Microevolution , and a Personal Account on Immune Engineering , 2003 .

[18]  J. Kennedy Thinking is Social , 1998 .

[19]  Fernando José Von Zuben,et al.  The construction of a Boolean competitive neural network using ideas from immunology , 2003, Neurocomputing.

[20]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[21]  Marco Dorigo,et al.  Swarm intelligence: from natural to artificial systems , 1999 .

[22]  Tong Zhang,et al.  Text Mining: Predictive Methods for Analyzing Unstructured Information , 2004 .

[23]  George Karypis,et al.  Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[24]  Fernando José Von Zuben,et al.  RABNET: a real-valued antibody network for data clustering , 2005, GECCO '05.

[25]  Andries P. Engelbrecht,et al.  Dynamic Clustering using Particle Swarm Optimization with Application in Unsupervised Image Classification , 2007 .

[27]  Thomas E. Potok,et al.  Document clustering using particle swarm optimization , 2005, Proceedings 2005 IEEE Swarm Intelligence Symposium, 2005. SIS 2005..

[28]  Walter Truszkowski,et al.  Formal Methods for Autonomic and Swarm-based Systems , 2004, ISoLA.

[29]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[30]  Leandro Nunes de Castro,et al.  Fundamentals of natural computing: an overview , 2007 .

[31]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[32]  James Kennedy Particle Swarms: Optimization Based on Sociocognition , 2005 .

[33]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[34]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[35]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[36]  Alex Alves Freitas,et al.  AISEC: an artificial immune system for e-mail classification , 2003, IEEE Congress on Evolutionary Computation.

[37]  G. Beni,et al.  The concept of cellular robotic system , 1988, Proceedings IEEE International Symposium on Intelligent Control 1988.