A hybrid classifier approach for Web retrieved documents classification

The paper presents a hybrid technique for the classification of Web returned hits into concept hierarchies. The technique involves a combination of manual and automatic classifiers. At first, all Web returned documents are assigned to human defined categories using manual classifiers, and then automatic classifiers are used to generate a concept hierarchy for each of these categories. The results of the evaluation reveal the following: (a) for polysemous queries, our system is able to generate meaningful categories corresponding to (but not limited to), the different semantic facets of the queries; (b) as expected, for non-polysemous queries the system generates fewer categories; (c) the hierarchy precision of the concept hierarchies generated for polysemous queries is found to be significantly better when compared to the one obtained using a baseline system.

[1]  Arnold L. Rosenberg,et al.  Finding topic words for hierarchical summarization , 2001, SIGIR '01.

[2]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[3]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[4]  Robert Villa,et al.  The effectiveness of query-specific hierarchic clustering in information retrieval , 2002, Inf. Process. Manag..

[5]  Chatchai Rakthin,et al.  Improve Searching in Large Document Collections Using Automatic Table-of-Contents Interface , 2002 .

[6]  David M. Pennock,et al.  Using web structure for classifying and describing web pages , 2002, WWW.

[7]  Terence R. Smith,et al.  Browsing large digital library collections using classification hierarchies , 1999, CIKM '99.

[8]  Paul Douglas,et al.  Proceedings International Conference on Information Technology: Coding and Computing , 2002, Proceedings. International Conference on Information Technology: Coding and Computing.

[9]  Stephanie W. Haas,et al.  Page and link classifications: connecting diverse resources , 1998, DL '98.

[10]  Jan-Ming Ho,et al.  ACIRD: Intelligent Internet Document Organization and Retrieval , 2002, IEEE Trans. Knowl. Data Eng..

[11]  Changzhi Li,et al.  SUMMARIZING SEARCH RESULTS WITH AUTOMATICTABLES OF CONTENTS , 2002 .

[12]  Jakob Nielsen,et al.  User interface directions for the Web , 1999, CACM.

[13]  Kevin Chen-Chuan Chang,et al.  PEBL: positive example based learning for Web page classification using SVM , 2002, KDD.

[14]  Yi-fang Brook Wu,et al.  Extracting Features from Web Search Returned Hits for Hierarchical Classification , 2003, IKE.

[15]  Ian H. Witten,et al.  Proceedings of the third ACM conference on Digital libraries , 1998 .