Web document classification using modified decision trees
暂无分享,去创建一个
Searching for Web pages is one of the most common tasks performed on the Web while Web page classification is the first step for Web search service construction. This paper proposes a method for classifying Web documents by using a height-three modified decision tree which splits the root, depth-one nodes, and depth-two nodes based on keywords, descriptions, and hyperlinks, respectively. A classification starts with a Web page at the root of the decision tree and traces paths downward to leaves, which give the categories of the page.
[1] C. Lee Giles,et al. Accessibility of information on the Web , 2000, INTL.
[2] Philip S. Yu,et al. Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..
[3] Leo Breiman,et al. Classification and Regression Trees , 1984 .
[4] Alberto O. Mendelzon,et al. Database techniques for the World-Wide Web: a survey , 1998, SGMD.