Comparison of neural models for document clustering

Abstract We compared the application of different algorithms to document clustering. The algorithms studied were Fuzzy C-Means, Fuzzy ART, Fuzzy ART for Fuzzy Clusters, Fuzzy Max-Min, and the Kohonen neural network (only the first is not a neural network). We generated a testbed from LISA, using some of the descriptors corresponding to the different records for the comparison of the results. The best results were found with Kohonen’s algorithm which also organizes the clusters topologically. We end by discussing in more detail the possibilities offered by Kohonen’s algorithm.

[1]  Félix Moya-Anegón,et al.  Automatic extraction of relationships between terms by means of Kohonen's algorithm , 2002 .

[2]  Teuvo Kohonen,et al.  An introduction to neural computing , 1988, Neural Networks.

[3]  William B. Frakes,et al.  Stemming Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[4]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[5]  R. Hecht-Nielsen,et al.  Neurocomputing: picking the human brain , 1988, IEEE Spectrum.

[6]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[7]  R.K. Jurgen,et al.  Sarnoff Labs: 'still crazy' but coping , 1988, IEEE Spectrum.

[8]  Samuel Kaski,et al.  Fast winner search for SOM-based monitoring and retrieval of high-dimensional data , 1999 .

[9]  Samuel Kaski,et al.  Keyword selection method for characterizing text document maps , 1999 .

[10]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[11]  Stephen Grossberg,et al.  Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system , 1991, Neural Networks.

[12]  Timo Honkela,et al.  Websom for Textual Data Mining , 1999, Artificial Intelligence Review.

[13]  Vicente P. Guerrero-Bote,et al.  Methods for the Analysis of the Uses of Scientific Information: The Case of the University of Extremadura (1996–7) , 2002 .

[14]  Samuel Kaski,et al.  Self organization of a massive text document collection , 1999 .

[15]  Vicente Pablo Guerrero Bote Redes neuronales aplicadas a las técnicas de recuperación documental , 1998 .

[16]  Robert N. Oddy,et al.  Information Retrieval Research , 1982 .

[17]  S. Grossberg,et al.  Fuzzy ART: an adaptive resonance algorithm for rapid, stable classification of analog patterns , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[18]  Patrick K. Simpson,et al.  Fuzzy min-max neural networks - Part 2: Clustering , 1993, IEEE Trans. Fuzzy Syst..

[19]  Robert M. Pap,et al.  Handbook of neural computing applications , 1990 .

[20]  James A. Reggia,et al.  Connectionist models and information retrieval , 1990 .

[21]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[22]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[23]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[24]  Hsinchun Chen,et al.  Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques , 1998, J. Am. Soc. Inf. Sci..

[25]  Xia Lin,et al.  Map Displays for Information Retrieval , 1997, J. Am. Soc. Inf. Sci..

[26]  Xia Lin Map displays for information retrieval , 1997 .

[27]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[28]  Erkki Oja,et al.  Kohonen Maps , 1999, Encyclopedia of Machine Learning.

[29]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[30]  José Ramón Hilera González,et al.  Redes neuronales artificiales: fundamentos, modelos y aplicaciones , 1995 .

[31]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[32]  Loet Leydesdorff,et al.  Mapping Change in Scientific Specialties: A Scientometric Reconstruction of the Development of Artificial Intelligence , 1996, J. Am. Soc. Inf. Sci..

[33]  P. K. Simpson,et al.  Fuzzy min-max neural networks , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[34]  Michael McGill,et al.  A performance evaluation of similarity measures, document term weighting schemes and representations in a Boolean environment , 1980, SIGIR '80.

[35]  Félix de Moya Anegón,et al.  Reduction of the dimension of a document space using the fuzzified output of a Kohonen network , 2001, J. Assoc. Inf. Sci. Technol..

[36]  Sunanda Mitra,et al.  Adaptive fuzzy leader clustering of complex data sets in pattern recognition , 1992, IEEE Trans. Neural Networks.

[37]  Félix de Moya Anegón,et al.  Document organization using Kohonen's algorithm , 2002, Inf. Process. Manag..