论文信息 - Web document classification using modified decision trees

Web document classification using modified decision trees

Searching for Web pages is one of the most common tasks performed on the Web while Web page classification is the first step for Web search service construction. This paper proposes a method for classifying Web documents by using a height-three modified decision tree which splits the root, depth-one nodes, and depth-two nodes based on keywords, descriptions, and hyperlinks, respectively. A classification starts with a Web page at the root of the decision tree and traces paths downward to leaves, which give the categories of the page.

Gerhard X. Ritter | Kai-Hsiung Chang | Wen-Chen Hu

[1] C. Lee Giles,et al. Accessibility of information on the Web , 2000, INTL.

[2] Philip S. Yu,et al. Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[3] Leo Breiman,et al. Classification and Regression Trees , 1984 .

[4] Alberto O. Mendelzon,et al. Database techniques for the World-Wide Web: a survey , 1998, SGMD.