Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System

An important issue for federated systems of diverse data sources is how to optimize cross-source queries, without building knowledge of individual sources into the optimizer. Garlic is a federated system with an emphasis on extensibility and diverse sources. To achieve these goals, data sources are attached to Garlic by means of a wrapper. Wrappers participate in query planning, telling Garlic what parts of a query a data source can do and how much it will cost. This paper describes a framework through which wrappers provide the necessary cost and cardinality information for optimization, and the facilities Garlic provides to make this task easier. Our framework makes it easy for wrappers to provide cost information, requires few changes to a conventional bottomup optimizer and is easily extensible to a broad range of sources. We believe that our framework for costing is the first to allow accurate cost estimates for diverse sources within the context of a traditional cost-based optimizer. We demonstrate the importance of cost information in choosing good plans, the flexibility of our framework, the accuracy it allows, and finally, that it works – the optimizer is able to choose good plans even for complex cross-source queries.

[1]  Beng Chin Ooi,et al.  Multidatabase query optimization: issues and solutions , 1993, Proceedings RIDE-IMS `93: Third International Workshop on Research Issues in Data Engineering: Interoperability in Multidatabase Systems.

[2]  Peter Scheuermann,et al.  Role-based Query Processing in Multidatabase Systems , 1994, EDBT.

[3]  Mary Roth,et al.  Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources , 1997, VLDB.

[4]  ZhaoHui Tang,et al.  Calibrating the Query Optimizer Cost Model of IRO-DB, an Object-Oriented Federated Database System , 1996, VLDB.

[5]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[6]  Qiang Zhu,et al.  Building regression cost models for multidatabase systems , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[7]  Hamid Pirahesh,et al.  Extensible query processing in starburst , 1989, SIGMOD '89.

[8]  Michael Stonebraker,et al.  Predicate migration: optimizing queries with expensive predicates , 1992, SIGMOD Conference.

[9]  Weimin Du,et al.  Query Optimization in a Heterogeneous DBMS , 1992, VLDB.

[10]  Guy M. Lman Grammar-like Functional Rules for Representing Query Optimization Alternatives , 1998 .

[11]  Tadeusz Morzy,et al.  Distributed Query Optimization in Loosly Coupled Multidatabase Systems , 1995, ICDT.

[12]  Umeshwar Dayal,et al.  Query Processing in a Multidatabase System , 1985, Query Processing in Database Systems.

[13]  Kyuseok Shim,et al.  Query Optimization in the Presence of Foreign Functions , 1993, VLDB.

[14]  Patrick Valduriez,et al.  Scaling heterogeneous databases and the design of Disco , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[15]  Guy M. Lohman,et al.  Grammar-like functional rules for representing query optimization alternatives , 1988, SIGMOD '88.

[16]  Hubert Naacke,et al.  Leveraging mediator cost models with heterogeneous data sources , 1998, Proceedings 14th International Conference on Data Engineering.

[17]  José A. Blakeley,et al.  Data access for the masses through OLE DB , 1996, SIGMOD '96.

[18]  David J. DeWitt,et al.  The BUCKY object-relational benchmark , 1997, SIGMOD '97.

[19]  Weimin Du,et al.  Reducing multidatabase query response time by tree balancing , 1995, SIGMOD '95.

[20]  Christos Faloutsos,et al.  QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.

[21]  K. Selçuk Candan,et al.  Query caching and optimization in distributed mediator systems , 1996, SIGMOD '96.

[22]  Laurent Amsaleg,et al.  Cost-based query scrambling for initial delays , 1998, SIGMOD '98.

[23]  Asuman Dogac,et al.  Dynamic query optimization on a distributed object management platform , 1996, CIKM '96.

[24]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[25]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[26]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.