Self-Organising Maps in Document Classification: A Comparison with Six Machine Learning Methods

This paper focuses on the use of self-organising maps, also known as Kohonen maps, for the classification task of text documents. The aim is to effectively and automatically classify documents to separate classes based on their topics. The classification with self-organising map was tested with three data sets and the results were then compared to those of six well known baseline methods: k-means clustering, Ward's clustering, k nearest neighbour searching, discriminant analysis, Naive Bayes classifier and classification tree. The self-organising map proved to be yielding the highest accuracies of tested unsupervised methods in classification of the Reuters news collection and the Spanish CLEF 2003 news collection, and comparable accuracies against some of the supervised methods in all three data sets.

[1]  Samuel Kaski,et al.  Mining massive document collections by the WEBSOM method , 2004, Inf. Sci..

[2]  José Ranilla,et al.  Experiments with Self Organizing Maps in CLEF 2003 , 2003, CLEF.

[3]  D. Whitefield,et al.  A review of: “Practical Nonpararnetric Statistics. By W. J. CONOVER. (New York: Wiley, 1971.) [Pl" x+462.] £5·25. , 1972 .

[4]  Kevin Kok Wai Wong,et al.  A SOM-Based Document Clustering Using Frequent Max Substrings for Non-Segmented Texts , 2010, J. Intell. Learn. Syst. Appl..

[5]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[6]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[7]  Yonghui Zhang,et al.  The Maximum Likelihood Method of Calculation of Reliability Index of Engineering Structures , 2010, ICICA.

[8]  David G. Stork,et al.  Pattern Classification , 1973 .

[9]  Björn Gambäck,et al.  Classifying Amharic News Text Using Self-Organizing Maps , 2005, SEMITIC@ACL.

[10]  Martti Juhola,et al.  On Document Classification with Self-Organising Maps , 2009, ICANNGA.

[11]  Yeuvo Jphonen,et al.  Self-Organizing Maps , 1995 .

[12]  G. Shoba,et al.  Classification of Documents Using Kohonen’s Self-Organizing Map , 2009 .

[13]  Martti Juhola,et al.  A study of the use of self-organising maps in information retrieval , 2009, J. Documentation.

[14]  Giovanni Da San Martino Self-Organizing Maps in Natural Language Processing , 2003 .

[15]  Krista Lagus,et al.  Text Retrieval Using Self-Organized Document Maps , 2002, Neural Processing Letters.

[16]  David J. Groggel,et al.  Practical Nonparametric Statistics , 2000, Technometrics.

[17]  Timo Honkela,et al.  Self-Organizing Maps In Natural Language Processing , 1997 .

[18]  Evaristo Jiménez-Contreras,et al.  A connectionist and multivariate approach to science maps: the SOM, clustering and MDS applied to library and information science research , 2006, J. Inf. Sci..

[19]  Jan Wessnitzer,et al.  A Model of Non-elemental Associative Learning in the Mushroom Body Neuropil of the Insect Brain , 2007, ICANNGA.

[20]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[21]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[22]  Carol Peters,et al.  Comparative Evaluation of Multilingual Information Access Systems , 2003, Lecture Notes in Computer Science.

[23]  Nirmalya Chowdhury,et al.  Unsupervised Text Classification Using Kohonen's Self Organizing Network , 2005, CICLing.

[24]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[25]  Ting Liu,et al.  The Comparison of SOM and K-means for Text Clustering , 2010, Comput. Inf. Sci..

[26]  Félix de Moya Anegón,et al.  Document organization using Kohonen's algorithm , 2002, Inf. Process. Manag..