Adaptive Multi-join Query Processing in PDBMS

Traditionally, distributed databases assume that the small) set of nodes participating in a query is known apriori, the data is well placed, and the statistics are readily available. However, these assumptions are no longer valid in a Peer-based DataBase Management System (PDBMS). As such, it is a challenge to process and optimize queries in a PDBMS. In this paper, we present our distributed solution to this problem for multi-way join queries. Our approach first processes a multi-way join query based on an initial query evaluation plan (generated using statistical data that may be obsolete or inaccurate); as the query is beingprocessed, statistics obtained on-the-fly are used to (continuously) refine the current plan dynamically into a more effective one. We have conducted an extensive performance study which shows that our adaptive query processing strategy can reduce the network traffic significantly.

[1]  John Mylopoulos,et al.  Data Sharing in the Hyperion Peer Database System , 2005, VLDB.

[2]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[3]  Dan Suciu,et al.  The Piazza peer data management project , 2003, SGMD.

[4]  Scott Shenker,et al.  Querying the Internet with PIER , 2003, VLDB.

[5]  Irving L. Traiger,et al.  System R: An Architectural Overview , 1999, IBM Syst. J..

[6]  Hamid Pirahesh,et al.  Robust query processing through progressive optimization , 2004, SIGMOD '04.

[7]  Zachary G. Ives,et al.  Adaptive query processing: Why, How, When, and What Next? , 2007, VLDB.

[8]  David J. DeWitt,et al.  Efficient mid-query re-optimization of sub-optimal query execution plans , 1998, SIGMOD '98.

[9]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[10]  Scott Shenker,et al.  The Architecture of PIER: an Internet-Scale Query Processor , 2005, CIDR.

[11]  Beng Chin Ooi,et al.  PeerDB: a P2P-based system for distributed data sharing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[12]  David E. Culler,et al.  PlanetLab: an overlay testbed for broad-coverage services , 2003, CCRV.

[13]  Beng Chin Ooi,et al.  Just-in-time query retrieval over partially indexed data on structured P2P overlays , 2008, SIGMOD Conference.

[14]  K. Bharath-Kumar,et al.  Routing to Multiple Destinations in Computer Networks , 1983, IEEE Trans. Commun..

[15]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[16]  Rajeev Motwani,et al.  The price of validity in dynamic networks , 2004, SIGMOD '04.