Reduction of the dimension of a document space using the fuzzified output of a Kohonen network

The vectors used in IR, whether to represent the documents or the terms, are high dimensional, and their dimensions increase as one approaches real problems. The algorithms used to manipulate them, however, consume enormously increasing amounts of computational capacity as the said dimension grows. We used the Kohonen algorithm and a fuzzification module to perform a fuzzy clustering of the terms. The degrees of membership obtained were used to represent the terms and, by extension, the documents, yielding a smaller number of components but still endowed with meaning. To test the results, we use a topological classification of sets of transformed and untransformed vectors to check that the same structure underlies both.

[1]  Jay F. Nunamaker,et al.  A graphical, self-organizing approach to classifying electronic meeting output , 1997 .

[2]  Xia Lin,et al.  Map Displays for Information Retrieval , 1997, J. Am. Soc. Inf. Sci..

[3]  Timo Honkela,et al.  Contextual Relations of Words in Grimm Tales, Analyzed by Self-Organizing Map , 1995 .

[4]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[5]  Vicente Pablo Guerrero Bote Redes neuronales aplicadas a las técnicas de recuperación documental , 1998 .

[6]  Loet Leydesdorff,et al.  Mapping Change in Scientific Specialties: A Scientometric Reconstruction of the Development of Artificial Intelligence , 1996, J. Am. Soc. Inf. Sci..

[7]  James C. Bezdek,et al.  Fuzzy Kohonen clustering networks , 1994, Pattern Recognit..

[8]  Charles Leave Neural Networks: Algorithms, Applications and Programming Techniques , 1992 .

[9]  Hsinchun Chen,et al.  Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques , 1998, J. Am. Soc. Inf. Sci..

[10]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[11]  Timo Honkela,et al.  Creating an Order in Digital Libraries with Self-Organizing Maps , 1996 .

[12]  Paul B. Kantor Information Retrieval Techniques , 1994 .

[13]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[14]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[15]  Xia Lin Map displays for information retrieval , 1997 .

[16]  Félix de Moya Anegón,et al.  Document organization using Kohonen's algorithm , 2002, Inf. Process. Manag..

[17]  Michael McGill,et al.  A performance evaluation of similarity measures, document term weighting schemes and representations in a Boolean environment , 1980, SIGIR '80.

[18]  Samuel Kaski,et al.  Self organization of a massive text document collection , 1999 .

[19]  Timo Honkela,et al.  Self-Organizing Maps of Document Collections , 1996 .

[20]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[21]  Peter Willett,et al.  An improved algorithm for the calculation of exact term discrimination values , 1988, Inf. Process. Manag..

[22]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[23]  Félix de Moya Anegón Los sistemas integrados de gestión bibliotecaria: estructuras de datos y recuperación de información , 1994 .

[24]  Samuel Kaski,et al.  Fast winner search for SOM-based monitoring and retrieval of high-dimensional data , 1999 .

[25]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[26]  William B. Frakes,et al.  Stemming Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[27]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[28]  Samuel Kaski,et al.  Keyword selection method for characterizing text document maps , 1999 .

[29]  Timo Honkela,et al.  Self-Organizing Maps of Document Collections: A New Approach to Interactive Exploration , 1996, KDD.

[30]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .