Optimizing source-call ordering in Information Gathering Plans

In this paper we consider the problem of optimizing the order in which source relations are joined in information gathering plans. This problem differs significantly from the traditional database query optimization problem, as sources on the Internet have a variety of access limitations and the execution cost in information gathering is affected both by network traffic and by the connection setup costs. We describe a way of representing the access capabilities of sources, and provide a greedy algorithm for ordering source calls that respects source limitations. Our algorithm also takes both access costs and traffic costs into account, without requring full source statistics. This algorithm is being evaluated in the context of Emerac, our prototype information gathering system.

[1]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[2]  Daniel S. Weld,et al.  Planning to Gather Information , 1996, AAAI/IAAI, Vol. 1.

[3]  Alon Y. Halevy,et al.  Recursive Plans for Information Gathering , 1997, IJCAI.

[4]  Daniel S. Weld,et al.  Planning to gather inforrnation , 1996, AAAI 1996.

[5]  Jeffrey D. Ullman,et al.  Capability based mediation in TSIMMIS , 1998, SIGMOD '98.

[6]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[7]  Hector Garcia-Molina,et al.  Capability-sensitive query processing on Internet sources , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[8]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[9]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[10]  Qiang Zhu,et al.  Building regression cost models for multidatabase systems , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[11]  Per-Åke Larson,et al.  Developing Regression Cost Models for Multidatabase Systems. , 1996 .

[12]  Jeffrey D. Ullman,et al.  Optimizing Large Join Queries in Mediation Systems , 1999, ICDT.

[13]  Subbarao Kambhampati,et al.  Planning for Information Gathering: A Tutorial Survey , 1997 .

[14]  Patrick Valduriez,et al.  Principles of distributed database systems (2nd ed.) , 1999 .

[15]  K. Selçuk Candan,et al.  Query caching and optimization in distributed mediator systems , 1996, SIGMOD '96.

[16]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[17]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[18]  Yannis Papakonstantinou,et al.  Describing and Using Query Capabilities of Heterogeneous Sources , 1997, VLDB.

[19]  Xiaolei Qian,et al.  Query folding , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[20]  Michael R. Genesereth,et al.  Answering recursive queries using views , 1997, PODS '97.

[21]  Subbarao Kambhampati,et al.  Optimizing Recursive Information-Gathering Plans , 1999, IJCAI.