A Self-Organizing Map Based Approach for Document Clustering and Visualization

In this paper, the clustering and visualization capabilities of the SOM, specifically tailored for the analysis of textual data, are reviewed and further developed. A novel clustering and visualization approach is proposed for the task of textual data mining. The proposed approach first transforms the document space into a multi-dimensional vector space by means of citation patterns. An intuitive and effective projection method, namely the ranked centroid projection (RCP), is then applied in conjunction with a dynamic SOM model, the growing hierarchical self-organizing map, which automatically produces document maps with various levels of details. The RCP is used both as a data analysis tool as well as a direct interface to the data. We also extend the RCP to address the problem of the incremental clustering of dynamic document collections. In a set of simulations, the proposed approach is applied to a synthetic data set and two real-world scientific document collections, to demonstrate its applicability.

[1]  Robert P. W. Duin,et al.  Sammon's mapping using neural networks: A comparison , 1997, Pattern Recognit. Lett..

[2]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[3]  Juha Vesanto,et al.  SOM-based data visualization methods , 1999, Intell. Data Anal..

[4]  Risto Miikkulainen,et al.  Incremental grid growing: encoding high-dimensional structure into a two-dimensional feature map , 1993, IEEE International Conference on Neural Networks.

[5]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[6]  Gary G. Yen,et al.  Unified mathematical treatment of complex cascaded bipartite networks: the case of collections of journal papers , 2005 .

[7]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[8]  Bernd Fritzke Growing Grid — a self-organizing network with constant neighborhood range and adaptation strength , 1995, Neural Processing Letters.

[9]  Andreas Rauber,et al.  The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data , 2002, IEEE Trans. Neural Networks.

[10]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[11]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[12]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[13]  Elias Pampalk,et al.  Using Smoothed Data Histograms for Cluster Visualization in Self-Organizing Maps , 2002, ICANN.

[14]  Bala Srinivasan,et al.  Dynamic self-organizing maps with controlled growth for knowledge discovery , 2000, IEEE Trans. Neural Networks Learn. Syst..

[15]  R. Bhatnagar,et al.  Anthrax Toxin , 2001, Critical reviews in microbiology.

[16]  Hujun Yin,et al.  ViSOM - a novel method for multivariate data projection and structure visualization , 2002, IEEE Trans. Neural Networks.

[17]  Bernd Fritzke,et al.  Growing cell structures--A self-organizing network for unsupervised and supervised learning , 1994, Neural Networks.