The learning vector quantization algorithm applied to automatic text classification tasks

Automatic text classification is an important task for many natural language processing applications. This paper presents a neural approach to develop a text classifier based on the Learning Vector Quantization (LVQ) algorithm. The LVQ model is a classification method that uses a competitive supervised learning algorithm. The proposed method has been applied to two specific tasks: text categorization and word sense disambiguation. Experiments were carried out using the Reuters-21578 text collection (for text categorization) and the Senseval-3 corpus (for word sense disambiguation). The results obtained are very promising and show that our neural approach based on the LVQ algorithm is an alternative to other classification systems.

[1]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[2]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[3]  Félix de Moya Anegón,et al.  Document organization using Kohonen's algorithm , 2002, Inf. Process. Manag..

[4]  Robert L. Mercer,et al.  Word-Sense Disambiguation Using Statistical Methods , 1991, ACL.

[5]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[6]  Hsinchun Chen,et al.  Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques , 1998, J. Am. Soc. Inf. Sci..

[7]  Adam Kilgarriff,et al.  The Senseval-3 English lexical sample task , 2004, SENSEVAL@ACL.

[8]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[9]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[10]  Stefan Wermter,et al.  Neural Network Based Document Clustering Using WordNet Ontologies , 2004, Int. J. Hybrid Intell. Syst..

[11]  Dieter Merkl,et al.  Text classification with self-organizing maps: Some lessons learned , 1998, Neurocomputing.

[12]  Xia Lin Map displays for information retrieval , 1997 .

[13]  Tsvi Kuflik,et al.  Automating Personal Categorization Using Artificial Neural Networks , 2001, User Modeling.

[14]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[15]  Hang Li,et al.  Text classification using ESC-based stochastic decision lists , 2002, Inf. Process. Manag..

[16]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[17]  Wai Lam,et al.  Automatic Textual Document Categorization Using Multiple Similarity-Based Models , 2001, SDM.

[18]  Timo Honkela,et al.  Newsgroup Exploration with WEBSOM Method and Browsing Interface , 1996 .

[19]  Andreas Rauber,et al.  Using Self-Organizing Maps to Organize Document Archives and to Charakterize Subject Matter: How to Make a Map Tell the News of the World , 1999, DEXA.

[20]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[21]  Timo Honkela,et al.  WEBSOM - Self-organizing maps of document collections , 1998, Neurocomputing.

[22]  Robert Dale,et al.  Handbook of Natural Language Processing , 2001, Computational Linguistics.

[23]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[24]  Bernard Widrow,et al.  The basic ideas in neural networks , 1994, CACM.

[25]  D. Madigan,et al.  Sparse Bayesian Classifiers for Text Categorization , 2003 .

[26]  Adam Kilgarriff,et al.  Introduction to the Special Issue on SENSEVAL , 2000, Comput. Humanit..

[27]  Dieter Merkl,et al.  Analysis of legal thesauri based on self-organising feature maps , 1995 .

[28]  Hinrich Sch Automatic Word Sense Discrimination , 1998 .

[29]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[30]  Yorick Wilks,et al.  Word Sense Disambiguation using Optimised Combinations of Knowledge Sources , 1998, COLING-ACL.

[31]  Adam Kilgarriff,et al.  SENSEVAL: an exercise in evaluating world sense disambiguation programs , 1998, LREC.

[32]  Stefan Wermter,et al.  Hybrid neural document clustering using guided self-organization and WordNet , 2004, IEEE Intelligent Systems.

[33]  Luis Alfonso Ureña López,et al.  LVQ for text categorization using a multilingual linguistic resource , 2003, Neurocomputing.

[34]  James L. McClelland Parallel Distributed Processing , 2005 .

[35]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[36]  Miguel Ángel García Cumbreras,et al.  The University of Jaén Word Sense Disambiguation system , 2004, SENSEVAL@ACL.

[37]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[38]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[39]  Adam Kilgarriff,et al.  What is word sense disambiguation good for? , 1997, ArXiv.

[40]  Dieter Merkl Structuring software for reuse-the case of self-organizing maps , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[41]  Krista Lagus,et al.  Text Retrieval Using Self-Organized Document Maps , 2002, Neural Processing Letters.

[42]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[43]  Hsinchun Chen,et al.  Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques , 1998, J. Am. Soc. Inf. Sci..

[44]  Stefan Wermter,et al.  Neural Network Agents for Learning Semantic Text Classification , 2000, Information Retrieval.

[45]  Bernard Widrow,et al.  Neural nets for adaptive filtering and adaptive pattern recognition , 1988, Computer.

[46]  Gerti Kappel,et al.  Application of self-organizing feature maps with lateral inhibition to structure a library of reusable software components , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[47]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[48]  Martha Palmer,et al.  The English all-words task , 2004, SENSEVAL@ACL.

[49]  Andreas Rauber,et al.  Digital libraries-classification and visualization techniques , 2000, Proceedings 2000 Kyoto International Conference on Digital Libraries: Research and Practice.

[50]  Donald Ross Duluth conference on computers and writing, and language instruction : [selected papers], Duluth, July 31-August 5, 1988 , 1990 .

[51]  Luis Alfonso Ureña López,et al.  Integrating Linguistic Resources in TC through WSD , 2001, Comput. Humanit..

[52]  George A. Miller,et al.  A Semantic Concordance , 1993, HLT.

[53]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[54]  David D. Lewis,et al.  Representation and Learning in Information Retrieval , 1991 .

[55]  Luis Alfonso Ureña López Resolución de la ambigüedad léxica en tareas de clasificación automática de documentos , 2002 .

[56]  Xia Lin,et al.  Map Displays for Information Retrieval , 1997, J. Am. Soc. Inf. Sci..

[57]  Andreas Rauber,et al.  Hierarchical Clustering of Document Archives with the Growing Hierarchical Self-Organizing Map , 2001, ICANN.

[58]  María Teresa Martín Valdivia Algoritmo LVQ aplicado a tareas de procesamiento del lenguaje natural , 2005 .

[59]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[60]  Samuel Kaski,et al.  Bibliography of Self-Organizing Map (SOM) Papers: 1981-1997 , 1998 .

[61]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[62]  Giovanni Da San Martino Self-Organizing Maps in Natural Language Processing , 2003 .

[63]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .