Capabilities-Based Query Rewriting in Mediator Systems

Users today are struggling to integrate a broad range of information sources providing different levels of query capabilities. Currently, data sources with different and limited capabilities are accessed either by writing rich functional wrappers for the more primitive sources, or by dealing with all sources at a “lowest common denominator”. This paper explores a third approach, in which a mediator ensures that sources receive queries they can handle, while still taking advantage of all of the query power of the source. We propose an architecture that enables this, and identify a key component of that architecture, the Capabilities-Based Rewriter (CBR). The CBR takes as input a description of the capabilities of a data source, and a query targeted for that data source. From these, the CBR determines component queries to be sent to the sources, commensurate with their abilities, and computes a plan for combining their results using joins, unions, selections, and projections. We provide a language to describe the query capability of data sources and a plan generation algorithm. Our description language and plan generation algorithm are schema independent and handle SPJ queries. We also extend CBR with a cost-based optimizer. The net effect is that we prune without losing completeness. Finally we compare the implementation of a CBR for the Garlic project with the algorithms proposed in this paper.

[1]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[2]  Xiaolei Qian,et al.  Query folding , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[3]  Patrick Valduriez,et al.  Scaling heterogeneous databases and the design of Disco , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[4]  Guy M. Lohman,et al.  Grammar-like functional rules for representing query optimization alternatives , 1988, SIGMOD '88.

[5]  Anand Rajaraman,et al.  Answering Queries Using Limited External Processors. , 1996, ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.

[6]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[7]  Weimin Du,et al.  The Pegasus heterogeneous multidatabase system , 1991, Computer.

[8]  Per-Åke Larson,et al.  Computing Queries from Derived Relations , 1985, VLDB.

[9]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[10]  Jeffrey D. Ullman,et al.  Answering queries using limited external query processors (extended abstract) , 1996, PODS.

[11]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[12]  Dennis McLeod,et al.  An Approach to Resolving Semantic Heterogenity in a Federation of Autonomous, Heterogeneous Database Systems , 1993, Int. J. Cooperative Inf. Syst..

[13]  Amar Gupta,et al.  Integration of Information Systems: Bridging Heterogeneous Databases , 1989 .

[14]  Joann J. Ordille,et al.  The World Wide Web as a Collection of Views: Query Processing in the Information Manifold , 1996, VIEWS.

[15]  Jeffrey D. Ullman,et al.  A Query Translation Scheme for Rapid Implementation of Wrappers , 1995, DOOD.

[16]  Laura M. Haas,et al.  Towards heterogeneous multimedia information systems: the Garlic approach , 1995, Proceedings RIDE-DOM'95. Fifth International Workshop on Research Issues in Data Engineering-Distributed Object Management.

[17]  Guy M. Lohman,et al.  Query Optimization in the IBM DB2 Family. , 1993 .

[18]  Roger King,et al.  Amalgame: A Tool for Creating Interoperating, Persistent, Heterogeneous Components , 1993, Advanced Database Systems.

[19]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[20]  Yannis Papakonstantinou,et al.  Describing and Using Query Capabilities of Heterogeneous Sources , 1997, VLDB.

[21]  Mihalis Yannakakis,et al.  Equivalences Among Relational Expressions with the Union and Difference Operators , 1980, J. ACM.

[22]  Ronald Fagin,et al.  Combining fuzzy information from multiple systems (extended abstract) , 1996, PODS.

[23]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[24]  Jeffrey D. Ullman,et al.  Answering queries using templates with binding patterns (extended abstract) , 1995, PODS '95.

[25]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[26]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[27]  Anand Rajaraman,et al.  Answering queries using templates with binding patterns (extended abstract) , 1995, PODS.

[28]  Laura M. Haas,et al.  Capabilities-Based Query Rewriting in Mediator Systems , 1996, Fourth International Conference on Parallel and Distributed Information Systems.