XPath lookup queries in P2P networks

We address the problem of querying XML data over a P2P network. In P2P networks, the allowed kinds of queries are usually exact-match queries over file names. We discuss the extensions needed to deal with XML data and XPath queries. A single peer can hold a whole document or a partial/complete fragment of the latter. Each XML fragment/document is identified by a distinct path expression, which is encoded in a distributed hash table. Our framework differs from content-based routing mechanisms, biased towards finding the most relevant peers holding the data. We perform fragments placement and enable fragments lookup by solely exploiting few path expressions stored on each peer. By taking advantage of quasi-zero replication of global catalogs, our system supports fast full and partial XPath querying. To this purpose, we have extended the Chord simulator and performed an experimental evaluation of our approach.

[1]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[2]  Dan Suciu,et al.  Distributed query evaluation on semistructured data , 2002, TODS.

[3]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[4]  Laks V. S. Lakshmanan,et al.  Minimization of tree pattern queries , 2001, SIGMOD '01.

[5]  Johannes Gehrke,et al.  Querying peer-to-peer networks using P-trees , 2004, WebDB '04.

[6]  David R. Karger,et al.  Building peer-to-peer systems with chord, a distributed lookup service , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[7]  Dan Suciu,et al.  What Can Database Do for Peer-to-Peer? , 2001, WebDB.

[8]  Ioana Manolescu,et al.  Dynamic XML documents with distribution and replication , 2003, SIGMOD '03.

[9]  Divyakant Agrawal,et al.  Approximate Range Selection Queries in Peer-to-Peer Systems , 2003, CIDR.

[10]  Michael Gertz,et al.  On Distributing XML Repositories , 2003, WebDB.

[11]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[12]  Alon Y. Halevy,et al.  Efficient query reformulation in peer data management systems , 2004, SIGMOD '04.

[13]  David J. DeWitt,et al.  Locating Data Sources in Large Distributed Systems , 2003, VLDB.

[14]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[15]  G. Weikum Querying the Internet with PIER , 2005 .

[16]  A. Broder Some applications of Rabin’s fingerprinting method , 1993 .

[17]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[18]  David R. Karger,et al.  Chord: a scalable peer-to-peer lookup protocol for internet applications , 2003, TNET.

[19]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[20]  Andrei Z. Broder,et al.  Efficient URL caching for world wide web crawling , 2003, WWW '03.

[21]  Paolo Manghi,et al.  XPeer: A Self-Organizing XML P2P Database System , 2004, EDBT Workshops.

[22]  Evaggelia Pitoura,et al.  Content-Based Routing of Path Queries in Peer-to-Peer Systems , 2004, EDBT.

[23]  Cong Yu,et al.  Schema-Free XQuery , 2004, VLDB.

[24]  Laks V. S. Lakshmanan,et al.  FleXPath: flexible structure and full-text querying for XML , 2004, SIGMOD '04.

[25]  Scott Shenker,et al.  Enhancing P2P File-Sharing with an Internet-Scale Query Processor , 2004, VLDB.

[26]  Renée J. Miller,et al.  Mapping data in peer-to-peer systems: semantics and algorithmic issues , 2003, SIGMOD '03.