论文信息 - Hierarchical text classification and evaluation

Hierarchical text classification and evaluation

Hierarchical classification refers to the assignment of one or more suitable categories from a hierarchical category space to a document. While previous work in hierarchical classification focused on virtual category trees where documents are assigned only to the leaf categories, we propose a top-down level-based classification method that can classify documents to both leaf and internal categories. As the standard performance measures assume independence between categories, they have not considered the documents incorrectly classified into categories that are similar to or not far from correct ones in the category tree. We therefore propose category-similarity measures and distance-based measures to consider the degree of misclassification in measuring the classification performance. An experiment has been carried out to measure the performance of our proposed hierarchical classification method. The results showed that our method performs well for a Reuters text collection when enough training documents are given and the new measures have indeed considered the contributions of misclassified documents.

Ee-Peng Lim | Aixin Sun | Aixin Sun | Ee-Peng Lim

[1] Minoru Sasaki,et al. Rule-based text categorization using hierarchical categories , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[2] Ke Wang,et al. Hierarchical Classification of Real Life Documents , 2001, SDM.

[3] David D. Lewis,et al. An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[4] Susan T. Dumais,et al. Hierarchical classification of Web content , 2000, SIGIR '00.

[5] Ke Wang,et al. Building Hierarchical Classifiers Using Class Proximity , 1999, VLDB.

[6] Timothy W. Finin,et al. Yahoo! as an ontology: using Yahoo! categories to describe documents , 1999, CIKM '99.

[7] Fabrizio Sebastiani,et al. Machine learning in automated text categorisation: a survey , 1999 .

[8] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[9] Dunja Mladenic,et al. Turning Yahoo to Automatic Web-Page Classifier , 1998, European Conference on Artificial Intelligence.

[10] Susan T. Dumais,et al. Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[11] Daphne Koller,et al. Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[12] Aaron Kershenbaum,et al. The Effect of Using Hierarchical Classifiers in Text Categorization , 2000, RIAO.