论文信息 - The learning vector quantization algorithm applied to automatic text classification tasks

The learning vector quantization algorithm applied to automatic text classification tasks

Automatic text classification is an important task for many natural language processing applications. This paper presents a neural approach to develop a text classifier based on the Learning Vector Quantization (LVQ) algorithm. The LVQ model is a classification method that uses a competitive supervised learning algorithm. The proposed method has been applied to two specific tasks: text categorization and word sense disambiguation. Experiments were carried out using the Reuters-21578 text collection (for text categorization) and the Senseval-3 corpus (for word sense disambiguation). The results obtained are very promising and show that our neural approach based on the LVQ algorithm is an alternative to other classification systems.

Luis Alfonso Ureña López | Manuel García Vega | María Teresa Martín-Valdivia | M. Martín-Valdivia | L. A. U. López

[1] James P. Callan,et al. Training algorithms for linear text classifiers , 1996, SIGIR '96.

[2] Hwee Tou Ng,et al. Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[3] Félix de Moya Anegón,et al. Document organization using Kohonen's algorithm , 2002, Inf. Process. Manag..

[4] Robert L. Mercer,et al. Word-Sense Disambiguation Using Statistical Methods , 1991, ACL.

[5] David D. Lewis,et al. A comparison of two learning algorithms for text categorization , 1994 .

[6] Hsinchun Chen,et al. Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques , 1998, J. Am. Soc. Inf. Sci..

[7] Adam Kilgarriff,et al. The Senseval-3 English lexical sample task , 2004, SENSEVAL@ACL.

[8] Andreas S. Weigend,et al. A neural network approach to topic spotting , 1995 .

[9] Philipp Slusallek,et al. Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[10] Stefan Wermter,et al. Neural Network Based Document Clustering Using WordNet Ontologies , 2004, Int. J. Hybrid Intell. Syst..

[11] Dieter Merkl,et al. Text classification with self-organizing maps: Some lessons learned , 1998, Neurocomputing.

[12] Xia Lin. Map displays for information retrieval , 1997 .

[13] Tsvi Kuflik,et al. Automating Personal Categorization Using Artificial Neural Networks , 2001, User Modeling.

[14] Yiming Yang,et al. An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[15] Hang Li,et al. Text classification using ESC-based stochastic decision lists , 2002, Inf. Process. Manag..

[16] Gerard Salton,et al. The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[17] Wai Lam,et al. Automatic Textual Document Categorization Using Multiple Similarity-Based Models , 2001, SDM.

[18] Timo Honkela,et al. Newsgroup Exploration with WEBSOM Method and Browsing Interface , 1996 .

[19] Andreas Rauber,et al. Using Self-Organizing Maps to Organize Document Archives and to Charakterize Subject Matter: How to Make a Map Tell the News of the World , 1999, DEXA.

[20] Sholom M. Weiss,et al. Automated learning of decision rules for text categorization , 1994, TOIS.

[21] Timo Honkela,et al. WEBSOM - Self-organizing maps of document collections , 1998, Neurocomputing.

[22] Robert Dale,et al. Handbook of Natural Language Processing , 2001, Computational Linguistics.

[23] Teuvo Kohonen,et al. Self-organization and associative memory: 3rd edition , 1989 .

[24] Bernard Widrow,et al. The basic ideas in neural networks , 1994, CACM.

[25] D. Madigan,et al. Sparse Bayesian Classifiers for Text Categorization , 2003 .

[26] Adam Kilgarriff,et al. Introduction to the Special Issue on SENSEVAL , 2000, Comput. Humanit..

[27] Dieter Merkl,et al. Analysis of legal thesauri based on self-organising feature maps , 1995 .

[28] Hinrich Sch. Automatic Word Sense Discrimination , 1998 .

[29] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[30] Yorick Wilks,et al. Word Sense Disambiguation using Optimised Combinations of Knowledge Sources , 1998, COLING-ACL.

[31] Adam Kilgarriff,et al. SENSEVAL: an exercise in evaluating world sense disambiguation programs , 1998, LREC.

[32] Stefan Wermter,et al. Hybrid neural document clustering using guided self-organization and WordNet , 2004, IEEE Intelligent Systems.

[33] Luis Alfonso Ureña López,et al. LVQ for text categorization using a multilingual linguistic resource , 2003, Neurocomputing.

[34] James L. McClelland. Parallel Distributed Processing , 2005 .

[35] Hinrich Schütze,et al. A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[36] Miguel Ángel García Cumbreras,et al. The University of Jaén Word Sense Disambiguation system , 2004, SENSEVAL@ACL.

[37] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.

[38] Yiming Yang,et al. A re-examination of text categorization methods , 1999, SIGIR '99.

[39] Adam Kilgarriff,et al. What is word sense disambiguation good for? , 1997, ArXiv.

[40] Dieter Merkl. Structuring software for reuse-the case of self-organizing maps , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[41] Krista Lagus,et al. Text Retrieval Using Self-Organized Document Maps , 2002, Neural Processing Letters.

[42] J. J. Rocchio,et al. Relevance feedback in information retrieval , 1971 .

[43] Hsinchun Chen,et al. Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques , 1998, J. Am. Soc. Inf. Sci..

[44] Stefan Wermter,et al. Neural Network Agents for Learning Semantic Text Classification , 2000, Information Retrieval.

[45] Bernard Widrow,et al. Neural nets for adaptive filtering and adaptive pattern recognition , 1988, Computer.

[46] Gerti Kappel,et al. Application of self-organizing feature maps with lateral inhibition to structure a library of reusable software components , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[47] Hinrich Schütze,et al. Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[48] Martha Palmer,et al. The English all-words task , 2004, SENSEVAL@ACL.

[49] Andreas Rauber,et al. Digital libraries-classification and visualization techniques , 2000, Proceedings 2000 Kyoto International Conference on Digital Libraries: Research and Practice.

[50] Donald Ross. Duluth conference on computers and writing, and language instruction : [selected papers], Duluth, July 31-August 5, 1988 , 1990 .

[51] Luis Alfonso Ureña López,et al. Integrating Linguistic Resources in TC through WSD , 2001, Comput. Humanit..

[52] George A. Miller,et al. A Semantic Concordance , 1993, HLT.

[53] Samuel Kaski,et al. Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[54] David D. Lewis,et al. Representation and Learning in Information Retrieval , 1991 .

[55] Luis Alfonso Ureña López. Resolución de la ambigüedad léxica en tareas de clasificación automática de documentos , 2002 .

[56] Xia Lin,et al. Map Displays for Information Retrieval , 1997, J. Am. Soc. Inf. Sci..

[57] Andreas Rauber,et al. Hierarchical Clustering of Document Archives with the Growing Hierarchical Self-Organizing Map , 2001, ICANN.

[58] María Teresa Martín Valdivia. Algoritmo LVQ aplicado a tareas de procesamiento del lenguaje natural , 2005 .

[59] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[60] Samuel Kaski,et al. Bibliography of Self-Organizing Map (SOM) Papers: 1981-1997 , 1998 .

[61] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[62] Giovanni Da San Martino. Self-Organizing Maps in Natural Language Processing , 2003 .

[63] Teuvo Kohonen,et al. Self-Organization and Associative Memory , 1988 .