Constructing Interface Schemas for Search Interfaces of Web Databases

Many databases have become Web-accessible through form-based search interfaces (i.e., search forms) that allow users to specify complex and precise queries to access the underlying databases. In general, such a Web search interface can be considered as containing an interface schema with multiple attributes and rich semantic/meta information; however, the schema is not formally defined on the search interface. Many Web applications, such as Web database integration and deep Web crawling, require the construction of the schemas. In this paper, we introduce a schema model for complex search interfaces, and present a tool (WISE-iExtractor) for automatically extracting and deriving all the needed information to construct the schemas. Our experimental results on real search interfaces indicate that this tool is highly effective.

[1]  Kevin Chen-Chuan Chang,et al.  Mind your vocabulary: query mapping across heterogeneous information sources , 1999, SIGMOD '99.

[2]  Nicholas Kushmerick,et al.  Learning to Invoke Web Forms , 2003, OTM.

[3]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[4]  Avigdor Gal,et al.  OntoBuilder: fully automatic extraction and consolidation of ontologies from Web sources , 2004, Proceedings. 20th International Conference on Data Engineering.

[5]  Mitesh Patel,et al.  Structured databases on the web: observations and implications , 2004, SGMD.

[6]  Clement T. Yu,et al.  WISE-cluster: clustering e-commerce search engines automatically , 2004, WIDM '04.

[7]  Clement T. Yu,et al.  WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce , 2003, VLDB.

[8]  Kevin Chen-Chuan Chang,et al.  Understanding Web query interfaces: best-effort parsing with hidden syntax , 2004, SIGMOD '04.

[9]  Andreas Paepcke,et al.  Efficient Web form entry on PDAs , 2001, WWW '01.

[10]  Clement T. Yu,et al.  Clustering e-commerce search engines , 2004, WWW Alt. '04.

[11]  Frederick H. Lochovsky,et al.  Data extraction and label assignment for web databases , 2003, WWW '03.

[12]  Silvana Castano,et al.  Semantic integration of heterogeneous information sources , 2001, Data Knowl. Eng..

[13]  Sriram Raghavan,et al.  Crawling the Hidden Web , 2001, VLDB.

[14]  Clement T. Yu,et al.  An interactive clustering-based approach to integrating source query interfaces on the deep Web , 2004, SIGMOD '04.

[15]  Kevin Chen-Chuan Chang,et al.  Statistical schema matching across web query interfaces , 2003, SIGMOD '03.

[16]  Tao Tao,et al.  Clustering Structured Web Sources: A Schema-Based, Model-Differentiation Approach , 2004, EDBT Workshops.

[17]  Clement T. Yu,et al.  Automatic extraction of web search interfaces for interface schema integration , 2004, WWW Alt. '04.