A Comparison of Word- and Sense-Based Text Categorization Using Several Classification Algorithms

Most of the text categorization algorithms in the literature represent documents as collections of words. An alternative which has not been sufficiently explored is the use of word meanings, also known as senses. In this paper, using several algorithms, we compare the categorization accuracy of classifiers based on words to that of classifiers based on senses. The document collection on which this comparison takes place is a subset of the annotated Brown Corpus semantic concordance. A series of experiments indicates that the use of senses does not result in any significant categorization improvement.

[1]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[2]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[3]  Dunja Mladenic,et al.  Machine Learning on non-homogeneous, distributed text data , 1998 .

[4]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[5]  Mark Sanderson,et al.  The impact on retrieval effectiveness of skewed frequency distributions , 1999, TOIS.

[6]  Mark Sanderson,et al.  Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/ , 2022 .

[7]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[8]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[9]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[10]  Vassilios Petridis,et al.  Predictive Modular Neural Networks: Applications to Time Series , 1998 .

[11]  Mohammed Benkhalifa,et al.  Integrating External Knowledge to Supplement Training Data in Semi-Supervised Learning for Text Categorization , 2004, Information Retrieval.

[12]  Alfonso Urena Lopez,et al.  Integrating and Evaluating WSD in the Adaptation of a Lexical Database in Text Categorization Task , 1998 .

[13]  Vassilios Petridis,et al.  Fuzzy Lattice Neurocomputing (FLN) models , 2000, Neural Networks.

[14]  Manuel de Buenaga Rodríguez,et al.  Using WordNet to Complement Training Information in Text Categorization , 1997, ArXiv.

[15]  Athanasios Kehagias,et al.  Predictive Modular Neural Networks , 1998 .

[16]  Mark Sanderson,et al.  Word sense disambiguation and information retrieval , 1994, SIGIR '94.

[17]  Stan Matwin,et al.  Text Classification Using WordNet Hypernyms , 1998, WordNet@ACL/COLING.

[18]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[19]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[20]  David G. Stork,et al.  Pattern Classification , 1973 .

[21]  Luis Alfonso Ureña López,et al.  Integrating Linguistic Resources in TC through WSD , 2001, Comput. Humanit..

[22]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[23]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[24]  Vassilios Petridis,et al.  An Intelligent Mechatronics Solution for Automated Tool Guidance in the Epidural Surgical Procedure , 2000 .

[25]  Athanasios Kehagias,et al.  Modular neural networks for MAP classification of time series and the partition algorithm , 1996, IEEE Trans. Neural Networks.

[26]  Dunja Mladenic,et al.  Text-learning and related intelligent agents: a survey , 1999, IEEE Intell. Syst..

[27]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[28]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[29]  W. Nelson Francis,et al.  FREQUENCY ANALYSIS OF ENGLISH USAGE: LEXICON AND GRAMMAR , 1983 .

[30]  Vassilios Petridis,et al.  Learning in the framework of fuzzy lattices , 1999, IEEE Trans. Fuzzy Syst..

[31]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[32]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[33]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.