论文信息 - Domain-Independent Classification for Deep Web Interfaces

Domain-Independent Classification for Deep Web Interfaces

The data sources of Deep Web provide tremendous structured data with high quality. However, classifying these interfaces with domain independent is required since the domains of the huge scale of deep web are hard to predefine. In this paper, we propose a novel approach with three-stage to solve this problem. First, we extract both texts and structure of a query interface by applying FIE algorithm we proposed. Then we construct frequent itemsets by using frequent pattern mining algorithm. Finally, we apply AP clustering algorithm to cluster the frequent itemsets according to similarity measure FGSTD presented in this paper. The experiment demonstrates our approach clusters interfaces well with domain independent.

Ge Yu | Derong Shen | Tiezheng Nie | Siwei Wang | Yingjun Li

[1] Anne H. H. Ngu,et al. Automatic Discovery and Inferencing of Complex Bioinformatics Web Interfaces , 2005, World Wide Web.

[2] Pierre Senellart,et al. Automatic wrapper induction from hidden-web sources with domain knowledge , 2008, WIDM '08.

[3] Clement T. Yu,et al. Clustering e-commerce search engines based on their search interface pages using WISE-Cluster , 2006, Data Knowl. Eng..

[4] Juliana Freire,et al. Combining classifiers to identify online databases , 2007, WWW '07.

[5] Tao Tao,et al. Organizing structured web sources by query schemas: a clustering approach , 2004, CIKM '04.