Distributed Query Processing in an Ad-hoc Semantic Web Data Sharing System

Sharing the Semantic Web data in proprietary datasets in which data is encoded in RDF triples in a decentralized environment calls for efficient support from distributed computing technologies. The highly dynamic ad-hoc settings that would be pervasive for Semantic Web data sharing among personal users in the future, however, pose even more demanding challenges for the enabling technologies. We extend previous work on a hybrid P2P architecture for an ad-hoc Semantic Web data sharing system which better models the data sharing scenario by allowing data to be maintained by its own providers and exhibits satisfactory scalability owing to the adoption of a two-level distributed index and hashing techniques. Additionally, we propose efficient distributed processing of SPARQL queries in such a context and explore optimization techniques that build upon distributed query processing for database systems and relational algebra optimization. We anticipate that our work will become an indispensable, complementary approach to making the Semantic Web a reality by delivering efficient data sharing and reusing in an ad-hoc environment.

[1]  Stefan Schmid,et al.  eQuus: A Provably Robust and Locality-Aware Peer-to-Peer System , 2006, Sixth IEEE International Conference on Peer-to-Peer Computing (P2P'06).

[2]  Gregor von Bochmann,et al.  Revisiting Join Site Selection in Distributed Database Systems , 2003, Euro-Par.

[3]  Philip S. Yu,et al.  Site assignment for relations and joint operations in the distributed transaction processing environment , 1988, Proceedings. Fourth International Conference on Data Engineering.

[4]  Martin J. Dürst,et al.  Internationalized Resource Identifiers (IRIs) , 2005, RFC.

[5]  Claudio Gutiérrez,et al.  The Expressive Power of SPARQL , 2008, SEMWEB.

[6]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[7]  Jörg Widmer,et al.  In-network aggregation techniques for wireless sensor networks: a survey , 2007, IEEE Wireless Communications.

[8]  Michael Schmidt,et al.  Foundations of SPARQL query optimization , 2008, ICDT '10.

[9]  Ying Qiao,et al.  CliqueStream: Creating an efficient and resilient transport overlay for peer-to-peer live streaming using a clustered DHT , 2010, Peer Peer Netw. Appl..

[10]  Nigel Shadbolt,et al.  Resource Description Framework (RDF) , 2009 .

[11]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[12]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[13]  Yang Cao,et al.  HP2P: A Hybrid Hierarchical P2P Network , 2007, First International Conference on the Digital Society (ICDS'07).

[14]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[15]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[16]  Ming-Syan Chen,et al.  On the Complexity of Distributed Query Optimization , 1996, IEEE Trans. Knowl. Data Eng..

[17]  Weimin Du,et al.  Reducing multidatabase query response time by tree balancing , 1995, SIGMOD '95.

[18]  Patrick Valduriez,et al.  Principles of distributed database systems (2nd ed.) , 1999 .

[19]  Gregor von Bochmann,et al.  Pushing quality of service information and requirements into global query optimization , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[20]  Lei Shi,et al.  On the Support of Ad-Hoc Semantic Web Data Sharing , 2012, Intelligent Information Processing.

[21]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[22]  Min Cai,et al.  RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network , 2004, WWW '04.