WEBSOM - Self-organizing maps of document collections

Abstract With the WEBSOM method a textual document collection may be organized onto a graphical map display that provides an overview of the collection and facilitates interactive browsing. Interesting documents can be located on the map using a content-directed search. Each document is encoded as a histogram of word categories which are formed by the self-organizing map (SOM) algorithm based on the similarities in the contexts of the words. The encoded documents are organized on another self-organizing map, a document map, on which nearby locations contain similar documents. Special consideration is given to the computation of very large document maps which is possible with general-purpose computers if the dimensionality of the word category histograms is first reduced with a random mapping method and if computationally efficient algorithms are used in computing the SOMs.

[1]  Dieter Merkl Lessons Learned in Text Document Classification , 1997 .

[2]  Timo Honkela,et al.  Newsgroup Exploration with WEBSOM Method and Browsing Interface , 1996 .

[3]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[4]  Teuvo Kohonen,et al.  Exploration of very large databases by self-organizing maps , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[5]  Gary Marchionini,et al.  A self-organizing semantic map for information retrieval , 1991, SIGIR '91.

[6]  Stephen I. Gallant,et al.  HNC's MatchPlus system , 1992, SIGF.

[7]  Fionn Murtagh,et al.  Neural networks and information extraction in astronomical information retrieval , 1996 .

[8]  Risto Miikkulainen,et al.  Subsymbolic natural language processing - an integrated model of scripts, lexicon, and memory , 1993, Neural network modeling and connectionism.

[9]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[10]  Timo Honkela,et al.  Creating an Order in Digital Libraries with Self-Organizing Maps , 1996 .

[11]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[12]  X. Lin,et al.  Visualization for the document space , 1992, Proceedings Visualization '92.

[13]  Heikki Hyötyniemi Text Document Classification with Self-Organizing Maps , 1996 .

[14]  J. C. Scholtes Unsupervised learning and the information retrieval problem , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[15]  D. Merkl,et al.  Content-based software classification by self-organization , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[16]  Gerti Kappel,et al.  Application of self-organizing feature maps with lateral inhibition to structure a library of reusable software components , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[17]  Xia Lin,et al.  Map Displays for Information Retrieval , 1997, J. Am. Soc. Inf. Sci..

[18]  Timo Honkela,et al.  Contextual Relations of Words in Grimm Tales, Analyzed by Self-Organizing Map , 1995 .

[19]  Samuel Kaski,et al.  Dimensionality reduction by random mapping: fast similarity computation for clustering , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[20]  Timo Honkela,et al.  Self-Organizing Maps of Document Collections , 1996 .

[21]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[22]  R. Mathar,et al.  Classification, Data Analysis, and Data Highways , 1998 .

[23]  Dieter Merkl Structuring software for reuse-the case of self-organizing maps , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[24]  Timo Honkela,et al.  Exploration of full-text databases with self-organizing maps , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[25]  Dieter Merkl CONTENT-BASED DOCUMENT CLASSIFICATION WITH HIGHLY COMPRESSED INPUT DATA , 1995 .

[26]  Timo Honkela,et al.  Self-Organizing Maps of Very Large Document Collections: Justification for the WEBSOM Method , 1998 .

[27]  S. Finch,et al.  Unsupervised methods for finding linguistic categories , 1992 .

[28]  Timo Honkela,et al.  Very Large Two-Level SOM for the Browsing of Newsgroups , 1996, ICANN.

[29]  Timo Honkela,et al.  WEBSOM -- A Status Report , 1996 .

[30]  F. Murtagh,et al.  A spatial user interface to the astronomical literature , 1998 .

[31]  Krista Lagus Map of WSOM'97 Abstracts - Alternative Index , 1997 .

[32]  Hsinchun Chen,et al.  Internet Categorization and Search: A Self-Organizing Approach , 1996, J. Vis. Commun. Image Represent..

[33]  Timo Honkela,et al.  Self-Organizing Maps of Document Collections: A New Approach to Interactive Exploration , 1996, KDD.

[34]  Helge Ritter,et al.  Learning ″Semantotopic Maps″ from Context , 1990 .

[35]  Timo Honkela,et al.  Browsing digital libraries with the aid of self-organizing maps , 1996 .