Self-Organising Maps for Hierarchical Tree View Document Clustering Using Contextual Information

In this paper we propose an effective method to cluster documents into a dynamically built taxonomy of topics, directly extracted from the documents. We take into account short contextual information within the text corpus, which is weighted by importance and used as input to a set of independently spun growing Self-Organising Maps (SOM). This work shows an increase in precision and labelling quality in the hierarchy of topics, using these indexing units. The use of the tree structure over sets of conventional two-dimensional maps creates topic hierarchies that are easy to browse and understand, in which the documents are stored based on their content similarity.

[1]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[2]  Bala Srinivasan,et al.  Dynamic self-organizing maps with controlled growth for knowledge discovery , 2000, IEEE Trans. Neural Networks Learn. Syst..

[3]  Ralf Der,et al.  Integrating Contextual Information into Text Document Clustering with Self-Organizing Maps , 2001, WSOM.

[4]  Hujun Yin,et al.  ViSOM - a novel method for multivariate data projection and structure visualization , 2002, IEEE Trans. Neural Networks.

[5]  Risto Miikkulainen,et al.  Script Recognition with Hierarchical Feature Maps , 1992 .

[6]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[7]  Israel Ben-Shaul,et al.  Ephemeral Document Clustering for Web Applications , 2001 .

[8]  Thomas Martinetz,et al.  'Neural-gas' network for vector quantization and its application to time-series prediction , 1993, IEEE Trans. Neural Networks.

[9]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[10]  Hujun Yin,et al.  Self-organising maps for tree view based hierarchical document clustering , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[11]  Timo Honkela,et al.  Self-Organizing Maps of Document Collections , 1996 .

[12]  Gerald Salton,et al.  Automatic text processing , 1988 .

[13]  Hujun Yin,et al.  Interpolating self-organising map (iSOM) , 1999 .

[14]  Timo Honkela,et al.  WEBSOM - Self-organizing maps of document collections , 1998, Neurocomputing.

[15]  Risto Mukkulainen,et al.  Script Recognition with Hierarchical Feature Maps , 1990 .

[16]  Andreas Rauber,et al.  The growing hierarchical self-organizing map , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.