On Document Classification with Self-Organising Maps

This research deals with the use of self-organising maps for the classification of text documents. The aim was to classify documents to separate classes according to their topics. We therefore constructed self-organising maps that were effective for this task and tested them with German newspaper documents. We compared the results gained to those of k nearest neighbour searching and k-means clustering. For five and ten classes, the self-organising maps were better yielding as high average classification accuracies as 88-89%, whereas nearest neighbour searching gave 74-83% and k-means clustering 72- 79% as their highest accuracies.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Nirmalya Chowdhury,et al.  Unsupervised Text Classification Using Kohonen's Self Organizing Network , 2005, CICLing.

[3]  Pedro M. Domingos,et al.  Learning to Match the Schemas of Data Sources: A Multistrategy Approach , 2003, Machine Learning.

[4]  Félix de Moya Anegón,et al.  Document organization using Kohonen's algorithm , 2002, Inf. Process. Manag..

[5]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[6]  Eija Airio Word normalization and decompounding in mono- and bilingual IR , 2006, Information Retrieval.

[7]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[8]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[9]  Martti Juhola,et al.  A study of the use of self-organising maps in information retrieval , 2009, J. Documentation.

[10]  Giovanni Da San Martino Self-Organizing Maps in Natural Language Processing , 2003 .

[11]  M. F. Fuller,et al.  Practical Nonparametric Statistics; Nonparametric Statistical Inference , 1973 .

[12]  Timo Honkela,et al.  Self-Organizing Maps In Natural Language Processing , 1997 .

[13]  Evaristo Jiménez-Contreras,et al.  A connectionist and multivariate approach to science maps: the SOM, clustering and MDS applied to library and information science research , 2006, J. Inf. Sci..

[14]  Samuel Kaski,et al.  Mining massive document collections by the WEBSOM method , 2004, Inf. Sci..

[15]  M. Dolores del Castillo,et al.  Evolutionary learning of document categories , 2006, Information Retrieval.

[16]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.