WIN: an efficient data placement strategy for parallel XML databases

The basic idea behind parallel database systems is to perform operations in parallel to reduce the response time and improve the system throughput. Data placement is a key factor on the overall performance of parallel systems. XML is semistructured data, traditional data placement strategies cannot serve it well. In this paper, we present the concept of intermediary node I Node, and propose a novel workload-aware data placement WIN to effectively decluster XML data, to obtain high intra query parallelism. The extensive experiments show that the speedup and scale up performance of WIN outperforms the previous strategies.

[1]  Rajeev Rastogi,et al.  RE-Tree: An Efficient Index Structure for Regular Expressions , 2002, VLDB.

[2]  Patrick Valduriez,et al.  Prototyping Bubba, A Highly Parallel Database System , 1990, IEEE Trans. Knowl. Data Eng..

[3]  Domenico Saccà,et al.  Database partitioning in a cluster of processors , 1983, TODS.

[4]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[5]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[6]  Ge Yu,et al.  XBase: making your gigabyte disk queriable , 2002, SIGMOD '02.

[7]  Ge Yu,et al.  Performance Evaluation of a DOM-Based XML Database: Storage, Indexing and Query Optimization , 2002, WAIM.

[8]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[9]  Ge Yu,et al.  Data placement and query processing based on RPE parallelisms , 2003, Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003.

[10]  Hongjun Lu,et al.  Holistic Twig Joins on Indexed XML Documents , 2003, VLDB.

[11]  Ge Yu,et al.  Transaction management for a distributed object storage system WAKASHI-design, implementation and performance , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[12]  Divesh Srivastava,et al.  Counting twig matches in a tree , 2001, Proceedings 17th International Conference on Data Engineering.

[13]  Jianhua Lv,et al.  XBase: Making your gigabyte disk files queriable , 2002, SIGMOD 2002.