Querying XML Data using PC Cluster System

This paper proposes a novel approach for querying large-scale XML data using PC cluster system. With the recent spread of the XML format, large-scale data coded in XML ranging from several hundreds of megabytes to several gigabytes has become common. However, XML databases are often innefficient in dealing with huge XML data. The problem is the complexity of the XML data model and query processing. To cope with this problem, we attempt to construct a parallel XML database on top of a PC cluster system. To this end, we discuss XML data partitioning to enable parallel processing of XML queries. We introduce a path-based partitioning for XML data. The obtained XML fragments are then allocated to cluster nodes. To obtain cost-efficient allocation of the fragments, we discuss cost functions for parallel XPath processing and an algorithm to compute pseudo-optimal allocation, which is based on the well-known genetic algorithm. Finally, we demonstrate effectiveness of the proposed scheme by a series of experiments.

[1]  Toshiyuki Amagasa,et al.  XRel: a path-based approach to storage and retrieval of XML documents using relational databases , 2001, ACM Trans. Internet Techn..

[2]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[3]  Kevin Lü,et al.  Parallel processing XML documents , 2002, Proceedings International Database Engineering and Applications Symposium.

[4]  Alistair Moffat,et al.  Word‐based text compression , 1989, Softw. Pract. Exp..

[5]  Kam-Fai Wong,et al.  WIN: an efficient data placement strategy for parallel XML databases , 2005, 11th International Conference on Parallel and Distributed Systems (ICPADS'05).

[6]  Mong-Li Lee,et al.  An evaluation of XML indexes for structural join , 2004, SGMD.

[7]  Vishu Krishnamurthy,et al.  Performance Challenges in Object-Relational DBMSs , 1999, IEEE Data Eng. Bull..

[8]  Ricardo A. Baeza-Yates,et al.  Fast and flexible word searching on compressed text , 2000, TOIS.

[9]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[10]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[11]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[12]  Michael Gertz,et al.  On Distributing XML Repositories , 2003, WebDB.

[13]  Ricardo A. Baeza-Yates,et al.  Compression: A Key for Next-Generation Text Retrieval Systems , 2000, Computer.

[14]  Shuqiang Yang,et al.  Parallel Storing and Querying XML Documents Using Relational DBMS , 2003, APPT.

[15]  Gonzalo Navarro,et al.  Lightweight natural language text compression , 2006, Information Retrieval.

[16]  Neoklis Polyzotis,et al.  Selectivity estimation for XML twigs , 2004, Proceedings. 20th International Conference on Data Engineering.

[17]  Gonzalo Navarro,et al.  (S, C)-Dense Coding: An Optimized Compression Code for Natural Language Text Databases , 2003, SPIRE.

[18]  M. Tamer Özsu,et al.  XBench - A Family of Benchmarks for XML DBMSs , 2002, EEXTT.