Query Processing in RDF/S-Based P2P Database Systems

In Peer-to-peer (P2P) systems a very large number of autonomous computing nodes (the peers) pool together their resources and rely on each other for data and services. More and more P2P data management systems rely nowadays on intensional (i.e., schema) information for integrating and querying peer bases. Such information can be easily captured by emerging Semantic Web languages such as RDF/S. In this chapter, we present the SQPeer middleware for processing RQL queries over peer RDF/S bases (materialized or virtual), which are advertised using adequate RVL views. The novelty of SQPeer lies on the interleaved execution of the query routing and planning phases using intensional advertisements of peer bases under the form of RDF/S schema fragments (i.e., views). More precisely, routing is responsible for discovering peer views relevant to a specific query based on appropriate subsumption techniques of RDF/S schema fragments. On the other hand, query planning relies on the obtained data localization information, as well as compile and run-time optimization techniques. The generated plans are then executed in a fully distributed way by the involved peers for obtaining as fast as possible the first results of a query available in peer bases. This can be achieved by initially considering the peer bases that answer the whole query and at each iteration round, to route and evaluate smaller query fragments. The interleaved execution not only favors intra-peer processing, which is less expensive that the inter-peer one, but additionally takes benefit of a parallel execution of the query routing, planning and execution in different peers. Peers can exchange query plans and results, as well as, revisit established plans using appropriate communication channels. We finally demonstrate through examples the execution of two main query processing phases for two different architectural alternatives, namely a hybrid and a structured RDF/S schema-based P2P system.

[1]  Fausto Giunchiglia,et al.  Data Management for Peer-to-Peer Computing : A Vision , 2002, WebDB.

[2]  Zachary G. Ives,et al.  Efficient query processing for data integration , 2002 .

[3]  Arnaud Sahuguet,et al.  Ubql: a distributed query language to program distributed query systems , 2002 .

[4]  Vassilis Christophides,et al.  Generating On the Fly Queries for the Semantic Web: The ICS-FORTH Graphical RQL Interface (GRQL) , 2004, SEMWEB.

[5]  Donald Kossmann,et al.  Iterative dynamic programming: a new class of query optimization algorithms , 2000, TODS.

[6]  Alon Y. Halevy,et al.  Efficiently ordering query plans for data integration , 1999, Proceedings 18th International Conference on Data Engineering.

[7]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[8]  Theoni Pitoura,et al.  Towards a Unifying Framework for Complex Query Processing over Structured Peer-to-Peer Data Networks , 2003, DBISP2P.

[9]  Vassilis Christophides,et al.  Benchmarking RDF Schemas for the Semantic Web , 2002, SEMWEB.

[10]  Erik Buchmann,et al.  Best Effort Query Processing in DHT-based P2P Systems , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[11]  Joseph M. Hellerstein,et al.  Decoupled query optimization for federated database systems , 2002, Proceedings 18th International Conference on Data Engineering.

[12]  Min Cai,et al.  RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network , 2004, WWW '04.

[13]  Wolf-Tilo Balke,et al.  Top-k Query Evaluation for Schema-Based Peer-to-Peer Networks , 2004, SEMWEB.

[14]  Karl Aberer,et al.  The chatty web: emergent semantics through gossiping , 2003, WWW '03.

[15]  Raphael Volz,et al.  A Comparison of RDF Query Languages , 2004, SEMWEB.

[16]  Vassilis Christophides,et al.  Querying RDF Descriptions for Community Web Portals , 2001, BDA.

[17]  Felix Naumann,et al.  Self-Extending Peer Data Management , 2005, BTW.

[18]  David Maier,et al.  Distributed Query Processing and Catalogs for Peer-to-Peer Systems , 2003, CIDR.

[19]  Peter A. Boncz,et al.  AmbientDB: Relational Query Processing in a P2P Network , 2003, DBISP2P.

[20]  Vassilis Christophides,et al.  Viewing the Semantic Web through RVL Lenses , 2003, SEMWEB.

[21]  Vassilis Christophides,et al.  The ICS-FORTH SWIM: A Powerful Semantic Web Integration Middleware , 2003, SWDB.

[22]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[23]  Ian Clarke,et al.  Freenet: A Distributed Anonymous Information Storage and Retrieval System , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[24]  Alfons Kemper,et al.  Hyperqueries: Dynamic Distributed Query Processing on the Internet , 2001, VLDB.

[25]  Wolfgang Nejdl,et al.  Super-peer-based routing and clustering strategies for RDF-based peer-to-peer networks , 2003, WWW '03.

[26]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[27]  Hector Garcia-Molina,et al.  Designing a super-peer network , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[28]  Ryan Huebsch,et al.  Freddies: DHT-Based Adaptive Query Processing via Federated Eddies , 2003 .

[29]  Björn Þór Jónsson,et al.  Performance tradeoffs for client-server query processing , 1996, SIGMOD '96.

[30]  Heiner Stuckenschmidt,et al.  Index structures and algorithms for querying distributed RDF repositories , 2004, WWW '04.

[31]  Wolfgang Nejdl,et al.  Distributed Queries and Query Optimization in Schema-Based P2P-Systems , 2003, DBISP2P.

[32]  Peter Triantafillou,et al.  Towards High Performance Peer-to-Peer Content and Resource Sharing Systems , 2003, CIDR.

[33]  Vassilis Christophides,et al.  RQL: a declarative query language for RDF , 2002, WWW.

[34]  Felix Naumann,et al.  Quality-driven Integration of Heterogenous Information Systems , 1999, VLDB.

[35]  Hector Garcia-Molina,et al.  Semantic Overlay Networks for P2P Systems , 2004, AP2PC.

[36]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[37]  David J. DeWitt,et al.  Processing Queries in a Large Peer-to-Peer System , 2003, CAiSE.

[38]  Subbarao Kambhampati,et al.  Joint optimization of cost and coverage of query plans in data integration , 2001, CIKM '01.

[39]  Alon Y. Halevy,et al.  Piazza: data management infrastructure for semantic web applications , 2003, WWW '03.

[40]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.