论文信息 - Strategies for minimising errors in hierarchical web categorisation

Strategies for minimising errors in hierarchical web categorisation

On the Web, browsing and searching categories is a popular method of finding documents. Two well-known category-based search systems are the Yahoo!~and DMOZ hierarchies, which are maintained by experts who assign documents to categories. However, manual categorisation by experts is costly, subjective, and not scalable with the increasing volumes of data that must be processed. Several methods have been investigated for effective automatic text categorisation. These include selection of categorisation methods, selection of pre-categorised training samples, use of hierachies, and selection of document fragments or features. In this paper, we further investigate categorisation into Web hierarchies and the role of hierarchical information in improving categorisation effectiveness. We introduce new strategies to reduce errors in hierarchical categorisation. In particular, we propose novel techniques that shift the assignment into higher level categories when lower level assignment is uncertain. Our results show that absolute error rates can be reduced by over 2%.

Hugh E. Williams | Wahyu Wibowo

[1] Daphne Koller,et al. Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[2] James P. Callan,et al. Training algorithms for linear text classifiers , 1996, SIGIR '96.

[3] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[4] Aaron Kershenbaum,et al. The Effect of Using Hierarchical Classifiers in Text Categorization , 2000, RIAO.

[5] Susan T. Dumais,et al. Hierarchical classification of Web content , 2000, SIGIR '00.

[6] Aaron Kershenbaum,et al. Category Levels in Hierarchical Text Categorization , 1998, EMNLP.

[7] Sholom M. Weiss,et al. Automated learning of decision rules for text categorization , 1994, TOIS.

[8] Padmini Srinivasan,et al. Hierarchical neural networks for text categorization , 1999, SIGIR 1999.

[9] Andreas S. Weigend,et al. Exploiting Hierarchy in Text Categorization , 1999, Information Retrieval.

[10] Gerard Salton,et al. Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[11] Gerald Salton,et al. Automatic text processing , 1988 .