Approximate XML Query Answers in DHT-Based P2P Networks

Due to the increasing number of independent data providers on the web, there is a growing number of web applications that require locating data sources distributed over the internet. Most of the current proposals in the literature focus on developing effective routing data synopses to answer simple XPath queries in structured or unstructured P2P networks. In this paper, we present an effective framework to support XPath queries extended with full-text search predicates over schemaless XML data distributed in a DHT-based P2P network. We construct two concise routing data synopses, termed structural summary and peer-document synopsis, to route the user query to most relevant peers that own documents that can satisfy the query. To evaluate the structural components in the query, a general query footprint derivation algorithm is developed to extract the query footprint from the query and match it with structural summaries. To improve the search performance, we adopt a lazy query evaluation strategy for evaluating the full-text search predicates in the query. Finally, we develop effective strategies to balance the data load distribution in the system. We conduct extensive experiments to show the scalability of our system, validate the efficiency and accuracy of our routing data synopses, and demonstrate the effectiveness of our load balancing schemes.

[1]  Ion Stoica,et al.  Peer-to-Peer Systems II , 2003, Lecture Notes in Computer Science.

[2]  Jeffrey F. Naughton,et al.  Updates for Structure Indexes , 2002, VLDB.

[3]  Stéphane Bressan,et al.  Efficiency and Effectiveness of XML Tools and Techniques and Data Integration over the Web , 2003, Lecture Notes in Computer Science.

[4]  David J. DeWitt,et al.  Locating Data Sources in Large Distributed Systems , 2003, VLDB.

[5]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[6]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[7]  Jan Chomicki,et al.  Hippo: A System for Computing Consistent Answers to a Class of SQL Queries , 2004, EDBT.

[8]  Ioana Manolescu,et al.  Constructing and querying peer-to-peer warehouses of XML resources , 2005, 21st International Conference on Data Engineering (ICDE'05).

[9]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[10]  Alfredo Cuzzocrea,et al.  XPath lookup queries in P2P networks , 2004, WIDM '04.

[11]  Evaggelia Pitoura,et al.  Content-Based Routing of Path Queries in Peer-to-Peer Systems , 2004, EDBT.

[12]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[13]  David Levine,et al.  XML Query Routing in Structured P2P Systems , 2006, DBISP2P.

[14]  Qiang Wang,et al.  An XML Routing Synopsis for Unstructured P2P Networks , 2006, 2006 Seventh International Conference on Web-Age Information Management Workshops.

[15]  Margo I. Seltzer,et al.  Berkeley DB , 1999, USENIX Annual Technical Conference, FREENIX Track.

[16]  David R. Karger,et al.  Chord: a scalable peer-to-peer lookup protocol for internet applications , 2003, TNET.

[17]  Evaggelia Pitoura,et al.  Peer-to-peer management of XML data: issues and research challenges , 2005, SGMD.