Using partial evaluation in distributed query evaluation

A basic idea in parallel query processing is that one is prepared to do more computation than strictly necessary at individual sites in order to reduce the elapsed time, the network traffic, or both in the evaluation of the query. We develop this idea for the evaluation of boolean XPath queries over a tree that is fragmented, both horizontally and vertically over a number of sites. The key idea is to send the whole query to each site which partially evaluates, in parallel, the query and sends the results as compact boolean functions to a coordinator which combines these to obtain the result. This approach has several advantages. First, each site is visited only once, even if several fragments of the tree are stored at that site. Second, no prior constraints on how the tree is decomposed are needed, nor is any structural information about the tree required, such as a DTD. Third, there is a satisfactory bound on the total computation performed on all sites and on the total network traffic. We also develop a simple incremental maintenance algorithm that requires communication only with the sites at which changes have taken place; moreover the network traffic depends neither on the data nor on the update. These results, we believe, illustrate the usefulness and potential of partial evaluation in distributed systems as well as centralized xml stores for evaluating XPath queries and beyond.

[1]  Ioana Manolescu,et al.  Dynamic XML documents with distribution and replication , 2003, SIGMOD '03.

[2]  Michael Gertz,et al.  On Distributing XML Repositories , 2003, WebDB.

[3]  Sihem Amer-Yahia,et al.  Distributed evaluation of network directory queries , 2004, IEEE Transactions on Knowledge and Data Engineering.

[4]  Neil D. Jones,et al.  An introduction to partial evaluation , 1996, CSUR.

[5]  Dan Suciu,et al.  Distributed query evaluation on semistructured data , 2002, TODS.

[6]  Alon Y. Halevy,et al.  Theory of answering queries using views , 2000, SGMD.

[7]  Johannes Gehrke,et al.  Querying peer-to-peer networks using P-trees , 2004, WebDB '04.

[8]  Jim Gray Where the rubber meets the sky: the semantic gap between data producers and data consumers , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[9]  Edith Cohen,et al.  Labeling dynamic XML trees , 2002, SIAM J. Comput..

[10]  Jarek Gryz,et al.  A Strategy for Partial Evaluation of Views , 2000, Intelligent Information Systems.

[11]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[12]  Jennifer Widom,et al.  View maintenance in a warehousing environment , 1995, SIGMOD '95.

[13]  Z. Meral Özsoyoglu,et al.  Rewriting XPath Queries Using Materialized Views , 2005, VLDB.

[14]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[15]  David Maier,et al.  Distributed queries without distributed state , 2002, WebDB.

[16]  Jun'ichi Tatemura,et al.  Incremental maintenance of path-expression views , 2005, SIGMOD '05.

[17]  David J. DeWitt,et al.  Chained declustering: a new availability strategy for multiprocessor database machines , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[18]  Laks V. S. Lakshmanan,et al.  Querying network directories , 1999, SIGMOD '99.

[19]  Prakash V. Ramanan,et al.  Efficient algorithms for minimizing tree pattern queries , 2002, SIGMOD '02.

[20]  T. Howes,et al.  LDAP: programming directory-enabled applications with lightweight directory access protocol , 1997 .

[21]  Hector Garcia-Molina,et al.  Online Balancing of Range-Partitioned Data with Applications to Peer-to-Peer Systems , 2004, VLDB.

[22]  Michael J. Franklin,et al.  Efficient Filtering of XML Documents for Selective Dissemination of Information , 2000, VLDB.

[23]  Christoph Koch,et al.  Efficient Processing of Expressive Node-Selecting Queries on XML Data in Secondary Storage: A Tree Automata-based Approach , 2003, VLDB.