Peer-to-peer management of XML data: issues and research challenges

Peer-to-peer (p2p) systems are attracting increasing attention as an efficient means of sharing data among large, diverse and dynamic sets of users. The widespread use of XML as a standard for representing and exchanging data in the Internet suggests using XML for describing data shared in a p2p system. However, sharing XML data imposes new challenges in p2p systems related to supporting advanced querying beyond simple keyword-based retrieval. In this paper, we focus on data management issues for processing XML data in a p2p setting, namely indexing, replication, clustering and query routing and processing. For each of these topics, we present the issues that arise, survey related research and highlight open research problems.

[1]  Peter Triantafillou,et al.  Towards High Performance Peer-to-Peer Content and Resource Sharing Systems , 2003, CIDR.

[2]  Renée J. Miller,et al.  Mapping data in peer-to-peer systems: semantics and algorithmic issues , 2003, SIGMOD '03.

[3]  Alon Y. Halevy,et al.  Efficient query reformulation in peer data management systems , 2004, SIGMOD '04.

[4]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[5]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[6]  Donald D. Chamberlin XQuery: An XML query language , 2002, IBM Syst. J..

[7]  Wolfgang Nejdl,et al.  A scalable and ontology-based P2P infrastructure for Semantic Web Services , 2002, Proceedings. Second International Conference on Peer-to-Peer Computing,.

[8]  Charu C. Aggarwal,et al.  XRules: an effective structural classifier for XML data , 2003, KDD '03.

[9]  Mary Baker,et al.  CUP: Controlled Update Propagation in Peer-to-Peer Networks , 2003, USENIX Annual Technical Conference, General Track.

[10]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[11]  Edith Cohen,et al.  Replication strategies in unstructured peer-to-peer networks , 2002, SIGCOMM.

[12]  Qiang Wang A Data Locating Mechanism for Distributed XML Data over P2P Networks , 2004 .

[13]  Dan Suciu,et al.  Distributed query evaluation on semistructured data , 2002, TODS.

[14]  Keishi Tajima,et al.  Answering XPath Queries over Networks by Sending Minimal Views , 2004, VLDB.

[15]  Michael Gertz,et al.  On Distributing XML Repositories , 2003, WebDB.

[16]  Ting Yu,et al.  Routing XML queries , 2004, Proceedings. 20th International Conference on Data Engineering.

[17]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[18]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[19]  Hector Garcia-Molina,et al.  Open Problems in Data-Sharing Peer-to-Peer Systems , 2003, ICDT.

[20]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002, ICS '02.

[21]  Michael Klein Interpreting XML via an RDF Schema , 2002, SAAKM@ECAI.

[22]  Min Cai,et al.  RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network , 2004, WWW '04.

[23]  Gurmeet Singh Manku,et al.  SETS: search enhanced by topic segmentation , 2003, SIGIR.

[24]  Karl Aberer,et al.  P-Grid: a self-organizing structured P2P system , 2003, SGMD.

[25]  David J. DeWitt,et al.  Locating Data Sources in Large Distributed Systems , 2003, VLDB.

[26]  Siu-Ming Yiu,et al.  An efficient and scalable algorithm for clustering XML documents by structure , 2004, IEEE Transactions on Knowledge and Data Engineering.

[27]  Vagelis Hristidis,et al.  Keyword proximity search on XML graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[28]  Gerhard Weikum,et al.  Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML Data , 2003, WebDB.

[29]  Wolfgang Nejdl,et al.  Information Integration in Schema-Based Peer-To-Peer Networks , 2003, CAiSE.

[30]  Evaggelia Pitoura,et al.  Content-Based Overlay Networks for XML Peers Based on Multi-level Bloom Filters , 2003, DBISP2P.

[31]  Ben Y. Zhao,et al.  An architecture for a secure service discovery service , 1999, MobiCom.

[32]  Alfredo Cuzzocrea,et al.  XPath lookup queries in P2P networks , 2004, WIDM '04.

[33]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[34]  Hector Garcia-Molina,et al.  Semantic Overlay Networks for P2P Systems , 2004, AP2PC.

[35]  Ioana Manolescu,et al.  Dynamic XML documents with distribution and replication , 2003, SIGMOD '03.

[36]  Steven J. DeRose,et al.  Extensible Markup Language (XML) Part 2: Linking , 1997, World Wide Web J..

[37]  Manish Parashar,et al.  Flexible information discovery in decentralized distributed systems , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[38]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[39]  Wolfgang Nejdl,et al.  Super-peer-based routing and clustering strategies for RDF-based peer-to-peer networks , 2003, WWW '03.

[40]  Heiner Stuckenschmidt,et al.  Index structures and algorithms for querying distributed RDF repositories , 2004, WWW '04.

[41]  David J. DeWitt,et al.  Processing Queries in a Large Peer-to-Peer System , 2003, CAiSE.

[42]  Mong-Li Lee,et al.  XClust: clustering XML schemas for effective integration , 2002, CIKM '02.

[43]  Tim Moors,et al.  Survey of Research towards Robust Peer-to-Peer Networks: Search Methods , 2007, RFC.

[44]  Vassilis Christophides,et al.  Semantic Query Routing and Processing in P2P Database Systems: The ICS-FORTH SQPeer Middleware , 2004, EDBT Workshops.

[45]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[46]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[47]  David Maier,et al.  Distributed Query Processing and Catalogs for Peer-to-Peer Systems , 2003, CIDR.

[48]  Renée J. Miller,et al.  Data mapping in peer-to-peer systems: Semantics and algorithmic issues , 2003, SIGMOD 2003.

[49]  Neoklis Polyzotis,et al.  Structure and Value Synopses for XML Data Graphs , 2002, VLDB.

[50]  Paolo Manghi,et al.  XPeer: A Self-Organizing XML P2P Database System , 2004, EDBT Workshops.

[51]  Evaggelia Pitoura,et al.  Content-Based Routing of Path Queries in Peer-to-Peer Systems , 2004, EDBT.