Concept Indexing for Automated Text Categorization

In this paper we explore the potential of concept indexing with WordNet synsets for Text Categorization, in comparison with the traditional bag of words text representation model. We have performed a series of experiments in which we also test the possibility of using simple yet robust disambiguation methods for concept indexing, and the effectiveness of stoplist-filtering and stemming on the SemCor semantic concordance. Results are not conclusive yet promising.

[1]  Stan Matwin,et al.  A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization , 2001 .

[2]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[3]  David D. Lewis,et al.  Representation and Learning in Information Retrieval , 1991 .

[4]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[5]  Tat-Seng Chua,et al.  Building Semantic Perceptron Net for Topic Spotting , 2001, ACL.

[6]  Adam Kilgarriff,et al.  Framework and Results for English SENSEVAL , 2000, Comput. Humanit..

[7]  Markus Junker,et al.  Exploiting Thesaurus Knowledge in Rule Induction for Text Classification , 1997 .

[8]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[9]  Jude W. Shavlik,et al.  Machine Learning: Proceedings of the Fifteenth International Conference , 1998 .

[10]  Ellen Riloff,et al.  Using learned extraction patterns for text classification , 1995, Learning for Natural Language Processing.

[11]  Christiane Fellbaum,et al.  Using Wordnet for Text Retrieval , 1998 .

[12]  George A. Miller,et al.  A Semantic Concordance , 1993, HLT.

[13]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[14]  Sam Scott Feature Engineering for a Symbolic Approach to Text Classification , 1998 .

[15]  Hinrich Schütze,et al.  Automatic Detection of Text Genre , 1997, ACL.

[16]  Ellen Riloff,et al.  Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing , 1996, Lecture Notes in Computer Science.

[17]  Amita Goyal Chin Text Databases and Document Management: Theory and Practice , 2000 .

[18]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[19]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[20]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[21]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[22]  Athanasios Kehagias,et al.  Text classification using the /spl sigma/-FLNMAP neural network , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[23]  Rada Mihalcea,et al.  Semantic Indexing using WordNet Senses , 2000 .

[24]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[25]  John Tait,et al.  Word sense disambiguation in information retrieval revisited , 2003, SIGIR.