Externally growing self-organizing maps and its application to e-mail database visualization and exploration

In this paper we present an approach to organize and classify e-mails using self-organizing maps. The aim is on the one hand to provide an intuitive visual profile of the considered mailing lists and on the other hand to offer an intuitive navigation tool, were similar e-mails are located close to each other, so that the user can scan easily for e-mails similar in content. To be able to evaluate this approach we have developed a prototypical software tool that imports messages from a mailing list and arranges/groups these e-mails based on a similarity measure. The tool combines conventional keyword search methods with a visualization of the considered e-mail collection. The prototype was developed based on externally growing self-organizing maps, which solve some problems of conventional self-organizing maps and which are computationally viable. Besides the underlying algorithms we present and discuss some system evaluations in order to show the capabilities of the approach.

[1]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[2]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[3]  Timo Honkela,et al.  Newsgroup Exploration with WEBSOM Method and Browsing Interface , 1996 .

[4]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[5]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[6]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[7]  T. Kohonen,et al.  Self-organizing semantic maps , 1989, Biological Cybernetics.

[8]  S. Robertson The probability ranking principle in IR , 1997 .

[9]  Samuel Kaski,et al.  Keyword selection method for characterizing text document maps , 1999 .

[10]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[11]  Andreas Nürnberger,et al.  Weighted Self-Organizing Maps: Incorporating User Feedback , 2003, ICANN.

[12]  Gary Marchionini,et al.  A self-organizing semantic map for information retrieval , 1991, SIGIR '91.

[13]  A. Nurnberger,et al.  Visualizing changes in data collections using growing self-organizing maps , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[14]  Andreas Nürnberger,et al.  Interactive retrieval of multimedia objects based on self-organising maps , 2001, EUSFLAT Conf..

[15]  Lynn A. Streeter,et al.  Comparing and combining the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval , 1989, Inf. Process. Manag..

[16]  Samuel Kaski,et al.  Dimensionality reduction by random mapping: fast similarity computation for clustering , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[17]  Warren R. Greiff,et al.  A theory of term weighting based on exploratory data analysis , 1998, SIGIR '98.

[18]  C. J. van Rijsbergen,et al.  A Non-Classical Logic for Information Retrieval , 1997, Comput. J..

[19]  Andreas Rauber,et al.  LabelSOM: on the labeling of self-organizing maps , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[20]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[21]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[22]  Paul A. Viola,et al.  Restructuring Sparse High Dimensional Data for Effective Retrieval , 1998, NIPS.

[23]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[24]  W. Bruce Croft,et al.  Inference networks for document retrieval , 1989, SIGIR '90.

[25]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[26]  Rudolf Kruse,et al.  Interactive text retrieval based on document similarities , 2000 .

[27]  Christian Borgelt,et al.  Fast Fuzzy Clustering of Web Page Collections , 2004 .

[28]  Bernd Fritzke,et al.  Growing cell structures--A self-organizing network for unsupervised and supervised learning , 1994, Neural Networks.

[29]  James Allan,et al.  Automatic structuring and retrieval of large text files , 1994, CACM.

[30]  Timo Honkela,et al.  Self-Organizing Maps In Natural Language Processing , 1997 .

[31]  Bala Srinivasan,et al.  Dynamic self-organizing maps with controlled growth for knowledge discovery , 2000, IEEE Trans. Neural Networks Learn. Syst..

[32]  Giovanni Da San Martino Self-Organizing Maps in Natural Language Processing , 2003 .