Efficiently ordering query plans for data integration

The goal of a data integration system is to provide a uniform interface to a multitude of data sources. Given a user query formulated in this interface, the system translates it into a set of query plans. Each plan is a query formulated over the data sources, and specifies a way to access sources and combine data to answer the user query. In practice, when the number of sources is large, a data-integration system must generate and execute many query plans with significantly varying utilities. Hence, it is crucial that the system finds the best plans efficiently and executes them first, to guarantee acceptable time to and the quality of the first answers. We describe efficient solutions to this problem. First, we formally define the problem of ordering query plans. Second, we identify several interesting structural properties of the problem and describe three ordering algorithms that exploit these properties. Finally, we describe experimental results that suggest guidance on which algorithms perform best under which conditions.

[1]  Alon Y. Halevy,et al.  An adaptive query execution system for data integration , 1999, SIGMOD '99.

[2]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[3]  Oliver M. Duschka Query Optimization Using Local Completeness , 1997, AAAI/IAAI.

[4]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[5]  Yannis Papakonstantinou,et al.  Fusion Queries over Internet Databases , 1998, EDBT.

[6]  Jeffrey F. Naughton,et al.  Query Size Estimation by Adaptive Sampling , 1995, J. Comput. Syst. Sci..

[7]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[8]  HalevyAlon,et al.  MiniCon: A scalable algorithm for answering queries using views , 2001, VLDB 2001.

[9]  Alon Y. Halevy,et al.  MiniCon: A scalable algorithm for answering queries using views , 2000, The VLDB Journal.

[10]  Yannis Papakonstantinou,et al.  Using Knowledge of Redundancy for Query Optimization in Mediators , 1998 .

[11]  Craig A. Knoblock,et al.  Planning by Rewriting , 2001, J. Artif. Intell. Res..

[12]  S. Kambhampati,et al.  Joint Optimization of Cost and Coverage of Information Gathering Plans , 2022 .

[13]  YerneniStanford,et al.  Maximizing Coverage of Mediated Web QueriesRamana , 2000 .

[14]  Joann J. Ordille,et al.  Query-Answering Algorithms for Information Agents , 1996, AAAI/IAAI, Vol. 1.

[15]  Yannis E. Ioannidis,et al.  Universality of Serial Histograms , 1993, VLDB.

[16]  Laurent Amsaleg,et al.  Cost-based query scrambling for initial delays , 1998, SIGMOD '98.

[17]  Craig A. Knoblock,et al.  Planning by Rewriting: E ciently Generating High-Quality Plans , 1999 .

[18]  P. Haddawy,et al.  Eecient Decision-theoretic Planning: Techniques and Empirical Analysis , 1995 .

[19]  Peter Haddawy,et al.  E cient Decision-Theoretic Planning : Techniques and Empirical Analysis , 1995 .

[20]  Oliver M. Duschlsa Query Optimization Using Local Completeness , 1999 .

[21]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[22]  Felix Naumann,et al.  Query Planning with Information Quality Bounds , 2000, FQAS.

[23]  Joseph M. Hellerstein,et al.  Eddies:Continuous Query Optimization , 1999, SIGMOD 2000.

[24]  Donald D. Chamberlin,et al.  Access Path Selection in a Relational Database Management System , 1989 .

[25]  Subbarao Kambhampati,et al.  Optimizing Recursive Information-Gathering Plans , 1999, IJCAI.

[26]  T. Dean,et al.  Planning under uncertainty: structural assumptions and computational leverage , 1996 .

[27]  Jeffrey F. Naughton,et al.  Query size estimation by adaptive sampling (extended abstract) , 1990, PODS.

[28]  Alon Y. Halevy,et al.  Using Probabilistic Information in Data Integration , 1997, VLDB.

[29]  Richard Goodwin Using Loops in Decision-Theoretic Refinement Planners , 1996, AIPS.

[30]  Craig A. Knoblock,et al.  Flexible and Scalable Query Planning in Distributed and Heterogeneous Environments , 1998, AIPS.

[31]  Michael R. Genesereth,et al.  Answering recursive queries using views , 1997, PODS '97.

[32]  Craig A. Knoblock,et al.  Modeling Web Sources for Information Integration , 1998, AAAI/IAAI.

[33]  Craig A. Knoblock Planning, Executing, Sensing, and Replanning for Information Gathering , 1995, IJCAI.

[34]  Felix Naumann,et al.  Quality-driven Integration of Heterogenous Information Systems , 1999, VLDB.

[35]  Alon Y. Levy Combining artificial intelligence and databases for data integration , 1999 .

[36]  Marc Friedman Daniel S. Weld E ciently Executing Information-Gathering Plans , 1997 .

[37]  Marc Friedman,et al.  Efficiently Executing Information-Gathering Plans , 1997, IJCAI.

[38]  Michael V. Mannino,et al.  Statistical profile estimation in database systems , 1988, CSUR.

[39]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[40]  Subbarao Kambhampati,et al.  Eeciently Executing Information Gathering Plans , 1998 .

[41]  Mark S. Boddy,et al.  An Analysis of Time-Dependent Planning , 1988, AAAI.

[42]  Peter Haddawy,et al.  Efficient Decision-Theoretic Planning: Techniques and Empirical Analysis , 1995, UAI.