Multiple Label Text Categorization on a Hierarchical Thesaurus

In this paper we describe our work on the automatic association of relevant topics, taken from a structured thesaurus, to documents written in natural languages. The approach we have followed models thesaurus topic assignment as a multiple label classification problem, where the whole set of possible classes is hierarchically organized.

[1]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[2]  Sebastian Thrun,et al.  Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[3]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[4]  Ian Witten,et al.  Data Mining , 2000 .

[5]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[6]  JAE HUN CHOI,et al.  An Object-Based Approach to Managing Domain Specific Thesauri: Semiautomatic Thesaurus Construction and Query-Based Browsing , 2002, Int. J. Softw. Eng. Knowl. Eng..

[8]  Miguel A. Alonso,et al.  A Grammatical Approach to the Extraction of Index Terms , 2003 .

[9]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[10]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[11]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[12]  Gerald Salton,et al.  Automatic text processing , 1988 .

[13]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[14]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[15]  Jihoon Yang,et al.  A Fast Algorithm for Hierarchical Text Classification , 2000, DaWaK.

[16]  Miguel A. Alonso,et al.  A Common Solution for Tokenization and Part-of-Speech Tagging , 2002, TSD.