A New Biomimetic Approach Based on Social Spiders for Clustering of Text

View the explosion of data volume and high circulating on the web (satellite data, genomic data ...) the classification of the data (data mining technique) is required. The clustering was performed by a method based bio (social spiders) because there is currently no method of learning that can almost directly represent unstructured data (text). Thus, to make a good data classification must be a good representation of the data. The representation of these data is performed by a vector whose components are derived from the overall weight of the corpus used (TF-IDF). A language-independent method was used to represent text documents is that of n-grams characters and words. Several similarity measures have been tested. To validate the classification we used a measure of assessment based on recall and precision (f-measure).

[1]  Hsinchun Chen,et al.  A concept space approach to addressing the vocabulary problem in scientific information retrieval: an experiment on the worm community system , 1997 .

[2]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[3]  Maurice Clerc,et al.  The particle swarm - explosion, stability, and convergence in a multidimensional complex space , 2002, IEEE Trans. Evol. Comput..

[4]  Ah-Hwee Tan,et al.  Text Mining: The state of the art and the challenges , 2000 .

[5]  Moshe Tennenholtz,et al.  Artificial Social Systems , 1992, Lecture Notes in Computer Science.

[6]  Joshua M. Epstein,et al.  Growing artificial societies , 1996 .

[7]  Joachim M. Buhmann,et al.  Data clustering and learning , 1998 .

[8]  Ido Dagan,et al.  Knowledge Discovery in Textual Databases (KDT) , 1995, KDD.

[9]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[10]  A. Drogoul,et al.  Multi-Agent Simulation as a Tool for Modeling Societies: Application to Social Differentiation in Ant Colonies , 1992, MAAMAW.

[11]  Jing Wang,et al.  Swarm Intelligence in Cellular Robotic Systems , 1993 .

[12]  Reda Mohamed Hamou,et al.  Text Clustering by 2D Cellular Automata Based on the N-Grams , 2010, 2010 First ACIS International Symposium on Cryptography, and Network Security, Data Mining and Knowledge Discovery, E-Commerce and Its Applications, and Embedded Systems.

[13]  Marco Dorigo,et al.  Ant colony optimization , 2006, IEEE Computational Intelligence Magazine.

[14]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[15]  Hamou Reda Mohamed,et al.  Representation of Textual Documents by the Approach Wordnet and N-grams for the Unsupervised Classification (Clustering) with 2D Cellular Automata: A Comparative Study , 2010 .