论文信息 - Link-Local Features for Hypertext Classification

Link-Local Features for Hypertext Classification

Previous work in hypertext classification has resulted in two principal approaches for incorporating information about the graph properties of the Web into the training of a classifier. The first approach uses the complete text of the neighboring pages, whereas the second approach uses only their class labels. In this paper, we argue that both approaches are unsatisfactory: the first one brings in too much irrelevant information, while the second approach is too coarse by abstracting the entire page into a single class label. We argue that one needs to focus on relevant parts of predecessor pages, namely on the region in the neighborhood of the origin of an incoming link. To this end, we will investigate different ways for extracting such features, and compare several different techniques for using them in a text classifier.

Johannes Fürnkranz | Hervé Utard | Johannes Fürnkranz | H. Utard | Hervé Utard

[1] Lise Getoor,et al. Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[2] Lior Rokach,et al. Data Mining And Knowledge Discovery Handbook , 2005 .

[3] Oliver A. McBryan,et al. GENVL and WWWW: Tools for taming the Web , 1994, WWW Spring 1994.

[4] Thomas G. Dietterich. Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[5] Tom M. Mitchell,et al. Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[6] Jaideep Srivastava,et al. Web Mining , 2004, Data Mining and Knowledge Discovery.

[7] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[8] Johannes Fürnkranz,et al. Hyperlink ensembles: a case study in hypertext classification , 2002, Inf. Fusion.

[9] Hervé Utard,et al. Hypertext Classification Diploma Thesis Hervé Utard , 2005 .

[10] Piotr Indyk,et al. Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[11] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.