Light-weight Domain-based Form Assistant: Querying Web Databases On the Fly

The Web has been rapidly "deepened" by myriad searchable databases online, where data are hidden behind query forms. Helping users query alternative "deep Web" sources in the same domain (e.g., Books, Airfares) is an important task with broad applications. As a core component of those applications, dynamic query translation (i.e., translating a user's query across dynamically selected sources) has not been extensively explored. While existing works focus on isolated subproblems (e.g., schema matching, query rewriting) to study, we target at building a complete query translator and thus face new challenges: 1) To complete the translator, we need to solve the predicate mapping problem (i.e., map a source predicate to target predicates), which is largely unexplored by existing works; 2) To satisfy our application requirements, we need to design a customizable system architecture to assemble various components addressing respective subproblems (i.e., schema matching, predicate mapping, query rewriting). Tackling these challenges, we develop a light-weight domain-based form assistant, which can generally handle alternative sources in the same domain and is easily customizable to new domains. Our experiment shows the effectiveness of our form assistant in translating queries for real Web sources.

[1]  Kevin Chen-Chuan Chang,et al.  Approximate query mapping: Accounting for translation closeness , 2001, The VLDB Journal.

[2]  Laura M. Haas,et al.  Clio: a semi-automatic tool for schema mapping , 2001, SIGMOD '01.

[3]  Kevin Chen-Chuan Chang,et al.  Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web , 2005, CIDR.

[4]  Jeffrey F. Naughton,et al.  On schema matching with opaque column names and data values , 2003, SIGMOD '03.

[5]  Sophie Cluet,et al.  Your mediators need data conversion! , 1998, SIGMOD '98.

[6]  Pedro M. Domingos,et al.  Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.

[7]  Jeffrey D. Ullman,et al.  A Query Translation Scheme for Rapid Implementation of Wrappers , 1995, DOOD.

[8]  Anand Rajaraman,et al.  Answering queries using templates with binding patterns (extended abstract) , 1995, PODS.

[9]  Keishi Tajima,et al.  SIGMOD Conference 2002 , 2002 .

[10]  Mitesh Patel,et al.  Structured databases on the web: observations and implications , 2004, SGMD.

[11]  Martin Bergman,et al.  The deep web:surfacing the hidden value , 2000 .

[12]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[13]  Jeffrey D. Ullman,et al.  Answering queries using templates with binding patterns (extended abstract) , 1995, PODS '95.

[14]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[15]  Laura M. Haas,et al.  Capabilities-Based Query Rewriting in Mediator Systems , 2004, Distributed and Parallel Databases.

[16]  Kevin Chen-Chuan Chang,et al.  Boolean Query Mapping Across Heterogeneous Information Sources , 1996, IEEE Trans. Knowl. Data Eng..

[17]  Kevin Chen-Chuan Chang,et al.  Understanding Web query interfaces: best-effort parsing with hidden syntax , 2004, SIGMOD '04.

[18]  B. Huberman,et al.  The Deep Web : Surfacing Hidden Value , 2000 .

[19]  Jiawei Han,et al.  Discovering complex matchings across web query interfaces: a correlation mining approach , 2004, KDD.

[20]  Jeffrey D. Ullman,et al.  Capability based mediation in TSIMMIS , 1998, SIGMOD '98.

[21]  Kevin Chen-Chuan Chang,et al.  Statistical schema matching across web query interfaces , 2003, SIGMOD '03.

[22]  Clement T. Yu,et al.  An interactive clustering-based approach to integrating source query interfaces on the deep Web , 2004, SIGMOD '04.

[23]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[24]  Michael R. Genesereth,et al.  Infomaster: an information integration system , 1997, SIGMOD '97.