Efficient evaluation of queries in a mediator for WebSources

We consider an architecture of mediators and wrappers for Internet accessible WebSources of limited query capability. Each call to a source is a WebSource Implementation (WSI) and it is associated with both a capability and (a possibly dynamic) cost. The multiplicity of WSIs with varying costs and capabilities increases the complexity of a traditional optimizer that must assign WSIs for each remote relation in the query while generating an (optimal) plan. We present a two-phase Web Query Optimizer (WQO). In a pre-optimization phase, the WQO selects one or more WSIs for a pre-plan; a pre-plan represents a space of query evaluation plans (plans) based on this choice of WSIs. The WQO uses cost-based heuristics to evaluate the choice of WSI assignment in the pre-plan and to choose a good pre-plan. The WQO uses the pre-plan to drive the extended relational optimizer to obtain the best plan for a pre-plan. A prototype of the WQO has been developed. We compare the effectiveness of the WQO, i.e., its ability to efficiently search a large space of plans and obtain a low cost plan, in comparison to a traditional optimizer. We also validate the cost-based heuristics by experimental evaluation of queries in the noisy Internet environment.

[1]  Michael J. Franklin,et al.  Dynamic Pipeline Scheduling for Improving Interactive Query Performance , 2001, VLDB.

[2]  Vladimir Zadorozhny,et al.  Validating an Access Cost Model for Wide Area Applications , 2001, CoopIS.

[3]  Anthony Kosky,et al.  Extending traditional query-based integration approaches for functional characterization of post-genomic data , 2001, Bioinform..

[4]  Bernard Rous,et al.  The ACM digital library , 2001, CACM.

[5]  Louiqa Raschid,et al.  Optimized seamless integration of biomolecular data , 2001, Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001).

[6]  Laura M. Haas,et al.  DiscoveryLink: A system for integrated access to life sciences data sources , 2001, IBM Syst. J..

[7]  Edward Y. Chang,et al.  On Answering Queries in the Presence of Limited Access Patterns , 2001, ICDT.

[8]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[9]  Roy Goldman,et al.  WSQ/DSQ: a practical approach for combined querying of databases and the Web , 2000, SIGMOD '00.

[10]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[11]  Vladimir Zadorozhny,et al.  Learning response time for WebSources using query feedback and application in query optimization , 2000, The VLDB Journal.

[12]  Edward Y. Chang,et al.  Query planning with limited source capabilities , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[13]  Laura M. Haas,et al.  Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System , 1999, VLDB.

[14]  Alon Y. Halevy,et al.  An adaptive query execution system for data integration , 1999, SIGMOD '99.

[15]  Peter J. Haas,et al.  Ripple joins for online aggregation , 1999, SIGMOD '99.

[16]  Ioana Manolescu,et al.  Query optimization in the presence of limited access patterns , 1999, SIGMOD '99.

[17]  Hector Garcia-Molina,et al.  Capability-sensitive query processing on Internet sources , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[18]  Jeffrey D. Ullman,et al.  Optimizing Large Join Queries in Mediation Systems , 1999, ICDT.

[19]  Laurent Amsaleg,et al.  Dynamic Query Operator Scheduling for Wide-Area Remote Access , 1998, Distributed and Parallel Databases.

[20]  Laurent Amsaleg,et al.  Cost-based query scrambling for initial delays , 1998, SIGMOD '98.

[21]  Hubert Naacke,et al.  Leveraging mediator cost models with heterogeneous data sources , 1998, Proceedings 14th International Conference on Data Engineering.

[22]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[23]  Yannis Papakonstantinou,et al.  Describing and Using Query Capabilities of Heterogeneous Sources , 1997, VLDB.

[24]  Miron Livny,et al.  The Case for Enhanced Abstract Data Types , 1997, VLDB.

[25]  Laura M. Haas,et al.  Capabilities-Based Query Rewriting in Mediator Systems , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[26]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[27]  K. Selçuk Candan,et al.  Query caching and optimization in distributed mediator systems , 1996, SIGMOD '96.

[28]  Patrick Valduriez,et al.  Scaling heterogeneous databases and the design of Disco , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[29]  Béatrice Finance,et al.  IRO-DB: a distributed system federating object and relational databases , 1995 .

[30]  Kyuseok Shim,et al.  Query Optimization in the Presence of Foreign Functions , 1993, VLDB.

[31]  Weimin Du,et al.  Query Optimization in a Heterogeneous DBMS , 1992, VLDB.

[32]  Yannis E. Ioannidis,et al.  Randomized algorithms for optimizing large join queries , 1990, SIGMOD '90.

[33]  Jeffrey D. Ullman,et al.  Principles of database and knowledge-base systems, Vol. I , 1988 .

[34]  Margaret Martin,et al.  The Bureau of Labor Statistics. , 1970 .

[35]  Sunita Sarawagi,et al.  Automatically Extracting Structure from Free Text Addresses. , 2000 .

[36]  Daniel S. Weld,et al.  Adaptive Query Processing for Internet Applications , 2000 .

[37]  M. Franklin,et al.  XJoin: A Reactively-Scheduled Pipelined Join Operator , 2000, IEEE Data Eng. Bull..

[38]  Vladimir Zadorozhny,et al.  Validating a Cost Model for Wide Area Applications , 2000 .

[39]  Mehul A. Shah,et al.  Adaptive Query Processing: Technology in Evolution. , 2000 .

[40]  Laura Bright,et al.  A Wrapper Generation toolkit to specify and construct Wrappersfor Web Accessible Data Sources ( WebSources ) , 1999 .

[41]  Y. Papakonstantinou,et al.  Using Knowledge of Redundancy for Query Optimization in Mediators , 1998 .

[42]  Jaime Prilusky,et al.  GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support , 1998, Bioinform..

[43]  Maria-Esther Vidal,et al.  Optimization of Wrappers and Mediators for Web Accessible Data Sources (WebSources) , 1998, Workshop on Web Information and Data Management.

[44]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[45]  Donald D. Chamberlin,et al.  Access Path Selection in a Relational Database Management System , 1989 .

[46]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[47]  T. G. Price,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.