XCube: Processing XPath queries in a hypercube overlay network

In this paper, we present the design and performance of XCube, a tag-based system for managing XML data in a hypercube overlay network. In XCube, each node in a d-dimensional hypercube is identified by a d-bit vector. A peer manages a smaller hypercube with dimension d′ < d. An XML document is compactly represented as a structure summary and a content summary. The structure summary comprises a d-bit vector derived from the distinct tag names in the document and a synopsis capturing the structure of the document. The content summary consists of a bit map that summarizes the document content. The metadata of a document, i.e., owner IP, document identifier, structure summary and content summary, is indexed at its anchor peer (the peer that manages the node with matching bit vector). In addition, the structure summary is further indexed at all peers that manages nodes whose bit vectors are covered by the document’s bit vector. An XPath query is processed in four phases. In phase 1, the query is routed to its anchor peer according to the bit vector of the query. In phase 2, the query is evaluated against all the synopses stored in its anchor peer and forwarded to the anchor peers of the matching synopses. In phase 3, the anchor peer of each related synopsis examines the query on the related bit maps and forwards the query to the related owner peers. Finally in phase 4, the owner peers evaluate the query on the XML documents and return answers to the querying peer. We also present a scheme that dynamically partitions the hypercube to balance the load across peers. We further exploit the partition history to remove redundant messages. We conduct a comprehensive experimental study and the results show the efficiency of XCube.

[1]  Qiang Wang A Data Locating Mechanism for Distributed XML Data over P2P Networks , 2004 .

[2]  Neoklis Polyzotis,et al.  XSKETCH synopses for XML data graphs , 2006, TODS.

[3]  Hector Garcia-Molina,et al.  Online Balancing of Range-Partitioned Data with Applications to Peer-to-Peer Systems , 2004, VLDB.

[4]  David J. DeWitt,et al.  Locating Data Sources in Large Distributed Systems , 2003, VLDB.

[5]  David J. DeWitt,et al.  Processing Queries in a Large Peer-to-Peer System , 2003, CAiSE.

[6]  Wolfgang Nejdl,et al.  A scalable and ontology-based P2P infrastructure for Semantic Web Services , 2002, Proceedings. Second International Conference on Peer-to-Peer Computing,.

[7]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[8]  M. Tamer Özsu,et al.  XBench benchmark and performance testing of XML DBMSs , 2004, Proceedings. 20th International Conference on Data Engineering.

[9]  Karl Aberer,et al.  P-Grid: A Self-Organizing Access Structure for P2P Information Systems , 2001, CoopIS.

[10]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[11]  Alfredo Cuzzocrea,et al.  XPath lookup queries in P2P networks , 2004, WIDM '04.

[12]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .

[13]  Ioana Manolescu,et al.  Constructing and Querying Peer-to-Peer Warehouses of XML Resources , 2004, SWDB.

[14]  Hongjun Lu,et al.  Query Processing in Parallel Relational Database Systems , 1994 .

[15]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[16]  M. Tamer Özsu,et al.  XSEED: Accurate and Fast Cardinality Estimation for XPath Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[17]  Yuh-Jzer Joung,et al.  Keyword Search in DHT-Based Peer-to-Peer Networks , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[18]  Paolo Manghi,et al.  XPeer: A Self-Organizing XML P2P Database System , 2004, EDBT Workshops.

[19]  Ioana Manolescu,et al.  Peer-to-peer warehousing of XML resources* , 2004, BDA.

[20]  Evaggelia Pitoura,et al.  Content-Based Routing of Path Queries in Peer-to-Peer Systems , 2004, EDBT.

[21]  Nahid Shahmehri,et al.  Proceedings of the Second International Conference on Peer-to-Peer Computing , 2002 .

[22]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[23]  Karl Aberer,et al.  Efficient Processing of XPath Queries with Structured Overlay Networks , 2005, OTM Conferences.

[24]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.