Growing parallel paths for entity-page discovery
暂无分享,去创建一个
In this paper, we use the structural and relational information on the Web to find entity-pages. Specifically, given a Web site and an entity-page (e.g., department and faculty member homepage) we seek to find all of the entity-pages of the same type (e.g., all faculty members in the department). To do this, we propose a web structure mining method which grows parallel paths through the web graph and DOM trees. We show that by utilizing these parallel paths we can efficiently discover all entity-pages of the same type. Finally, we demonstrate the accuracy of our method with a case study on various domains.
[1] Lorenzo Blanco,et al. Efficiently Locating Collections of Web Pages to Wrap , 2005, WEBIST.
[2] Valter Crescenzi,et al. Clustering Web pages based on their structure , 2005, Data Knowl. Eng..
[3] Donato Malerba,et al. Mapping web pages to database records via link paths , 2010, CIKM.
[4] Robert L. Grossman,et al. Mining data records in Web pages , 2003, KDD '03.