Automatic labeling of self-organizing maps for information retrieval

The self-organizing map is a very popular unsupervised neural network model for the analysis of high-dimensional input data as in information retrieval applications. However, the interpretation of the map requires much manual effort, especially as far as the analysis of the learned features and the characteristics of identified clusters is concerned. We present our novel LabelSOM method which, based on the features learned by the map, automatically selects the most descriptive features of the input patterns mapped onto a particular unit of the map, thus making the characteristics of the various clusters within the map explicit. We demonstrate the benefits of this approach on an example from text classification using a real-world document archive. In this particular case, the features correspond to keywords describing the contents of a document. The benefit of this approach is that the various document clusters are characterized in terms of shared keywords, thus making it easy for the user to explore the contents of an unknown document archive.

[1]  Marshall Ramsey,et al.  Information forage through adaptive visualization , 1998, DL '98.

[2]  Dieter Merkl,et al.  Text classification with self-organizing maps: Some lessons learned , 1998, Neurocomputing.

[3]  Dieter Merkl,et al.  Visualizing Similarities in High Dimensional Input Spaces with a Growing and Splitting Neural Network , 1996, ICANN.

[4]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[5]  D. Merkl,et al.  Content-based software classification by self-organization , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[6]  Risto Miikkulainen,et al.  Script Recognition with Hierarchical Feature Maps , 1992 .

[7]  Dieter Merkl,et al.  Exploration of text collections with hierarchical feature maps , 1997, SIGIR '97.

[8]  Timo Honkela,et al.  Self-Organizing Maps of Very Large Document Collections: Justification for the WEBSOM Method , 1998 .

[9]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[10]  Teuvo Kohonen,et al.  Self-Organization of Very Large Document Collections: State of the Art , 1998 .

[11]  Timo Honkela,et al.  WEBSOM - Self-organizing maps of document collections , 1998, Neurocomputing.

[12]  Andreas Rauber,et al.  SOMLib: a digital library system based on neural networks , 1999, DL '99.

[13]  Andreas Rauber,et al.  Creating an Order in Distributed Digital Libraries by Integrating Independent Self-Organizing Maps , 1998 .

[14]  Gary Marchionini,et al.  A self-organizing semantic map for information retrieval , 1991, SIGIR '91.

[15]  Dieter Merkl,et al.  A Connectionist View on Document Classification , 1995, Australasian Database Conference.

[16]  Timo Honkela,et al.  Self-Organizing Maps of Document Collections: A New Approach to Interactive Exploration , 1996, KDD.