Scalable discovery of networked data : Algorithms, Infrastructure, Applications

The OpenKnowledge project aims at knowledge sharing through open and flexible peer interactions. Within this project, we are developing a system that supports searching, developing and sharing of interactions/workflows consisting of roles implemented by software that can be shared and executed by peers. Part of this system is a discovery service, which will be the focus of this chapter. This service aspires to fulfill the above requirements featuring a Peer-to-Peer architecture and Distributed Hash Tables (DHTs) to achieve robustness through redundancy and scalability through decentralization. Resources are discovered using a set of attributevalue pairs. A straightforward DHT-based approach that creates a distributed inverted index suffers from a linear increase of messages and replicas with the number of attributes. We try to reduce this number by proposing an efficient multi-attribute routing algorithm. We emulate and test our implementation on the DAS-2 distributed supercomputer.

[1]  Jeremy J. Carroll,et al.  Named graphs, provenance and trust , 2005, WWW '05.

[2]  Manolis Koubarakis,et al.  Evaluating Conjunctive Triple Pattern Queries over Large Structured Overlay Networks , 2006, SEMWEB.

[3]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[4]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[5]  Manfred Hauswirth,et al.  Internet-scale push systems for information distribution|architecture , 1999 .

[6]  Guangwen Yang,et al.  Scalable Distributed Ontology Reasoning Using DHT-Based Partitioning , 2008, ASWC.

[7]  Andrew S. Tanenbaum,et al.  Distributed systems - principles and paradigms, 2nd Edition , 2007 .

[8]  Karl Aberer,et al.  GridVine: Building Internet-Scale Semantic Overlay Networks , 2004, SEMWEB.

[9]  Dieter Fensel,et al.  The Web Service Modeling Framework WSMF , 2002, Electron. Commer. Res. Appl..

[10]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[11]  David Chaum,et al.  Untraceable electronic mail, return addresses, and digital pseudonyms , 1981, CACM.

[12]  Frank van Harmelen,et al.  Models of Interaction as a Grounding for Peer to Peer Knowledge Sharing , 2008, Advances in Web Semantics I.

[13]  Andreas Harth,et al.  Optimized index structures for querying RDF from the Web , 2005, Third Latin American Web Congress (LA-WEB'2005).

[14]  Miguel Castro,et al.  Scribe: a large-scale and decentralized application-level multicast infrastructure , 2002, IEEE J. Sel. Areas Commun..

[15]  Peter R. Pietzuch,et al.  Hermes: a distributed event-based middleware architecture , 2002, Proceedings 22nd International Conference on Distributed Computing Systems Workshops.

[16]  Wendy Hall,et al.  The Semantic Web Revisited , 2006, IEEE Intelligent Systems.

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[19]  Georg Lausen,et al.  An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario , 2008, SEMWEB.

[20]  Sanjiva Weerawarana,et al.  Unraveling the Web services web: an introduction to SOAP, WSDL, and UDDI , 2002, IEEE Internet Computing.

[21]  Spyros Kotoulas,et al.  An Architecture for Peer-to-peer Reasoning , 2007, New Forms of Reasoning for the Semantic Web.

[22]  Frank van Harmelen,et al.  The OpenKnowledge System: An Interaction-Centered Approach to Knowledge Sharing , 2007, OTM Conferences.

[23]  Said Mirza Pahlevi,et al.  RDFCube: A P2P-Based Three-Dimensional Index for Structural Joins on Distributed Triple Stores , 2005, DBISP2P.

[24]  Craig A. Knoblock,et al.  Web service composition as planning , 2003 .

[25]  Armin Haller,et al.  WSMX - a semantic service-oriented architecture , 2005, IEEE International Conference on Web Services (ICWS'05).

[26]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[27]  Yunhao Liu,et al.  Ad-UDDI: An Active and Distributed Service Registry , 2005, TES.

[28]  Eric Newcomer,et al.  Understanding Web Services: XML, WSDL, SOAP, and UDDI , 2002 .

[29]  Jaideep Vaidya,et al.  Privacy-preserving indexing of documents on the network , 2003, The VLDB Journal.

[30]  Andy Oram,et al.  Peer-to-Peer: Harnessing the Power of Disruptive Technologies , 2001 .

[31]  Manish Parashar,et al.  A Peer-to-Peer Approach to Web Service Discovery , 2004, World Wide Web.

[32]  Dominic Battré,et al.  On Triple Dissemination, Forward-Chaining, and Load Balancing in DHT Based RDF Stores , 2005, DBISP2P.

[33]  Luciano Serafini,et al.  DRAGO: Distributed Reasoning Architecture for the Semantic Web , 2005, ESWC.

[34]  Wolfgang Nejdl,et al.  A scalable and ontology-based P2P infrastructure for Semantic Web Services , 2002, Proceedings. Second International Conference on Peer-to-Peer Computing,.

[35]  Ahmed Helmy,et al.  Rendezvous regions: a scalable architecture for service location and data-centric storage in large-scale wireless networks , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[36]  Tony Andrews Business Process Execution Language for Web Services Version 1.1 , 2003 .

[37]  Axel Polleres,et al.  Enabling Trust and Privacy on the Social Web , 2009 .

[38]  Jon Crowcroft,et al.  A survey and comparison of peer-to-peer overlay network schemes , 2005, IEEE Communications Surveys & Tutorials.

[39]  Emanuele Della Valle,et al.  PAGE: A Distributed Infrastructure for Fostering RDF-Based Interoperability , 2006, DAIS.

[40]  Marianne Winslett,et al.  Zerber: r-confidential indexing for distributed documents , 2008, EDBT '08.

[41]  Jason Maassen,et al.  Ibis: a flexible and efficient Java‐based Grid programming environment , 2005, Concurr. Pract. Exp..

[42]  Dieter Fensel,et al.  Towards LarKC: A Platform for Web-Scale Reasoning , 2008, 2008 IEEE International Conference on Semantic Computing.

[43]  Weisong Shi,et al.  Scalable and Secure P2P Overlay Networks , 2005, Handbook on Theoretical and Algorithmic Aspects of Sensor, Ad Hoc Wireless, and Peer-to-Peer Networks.

[44]  Karl Aberer,et al.  P-Grid: a self-organizing structured P2P system , 2003, SGMD.

[45]  Drummond Reed,et al.  OpenID 2.0: a platform for user-centric identity management , 2006, DIM '06.

[46]  Jerry R. Hobbs,et al.  DAML-S: Semantic Markup for Web Services , 2001, SWWS.

[47]  Spyros Kotoulas,et al.  An Efficient Peer to Peer Image Retrieval Technique Using Content Addressable Networks , 2006, SETN.

[48]  Takahiro Kawamura,et al.  Importing the Semantic Web in UDDI , 2002, WES.

[49]  John Kubiatowicz,et al.  Handling churn in a DHT , 2004 .

[50]  Spyros Kotoulas,et al.  Massively Scalable Web Service Discovery , 2009, 2009 International Conference on Advanced Information Networking and Applications.

[51]  Gurmeet Singh Manku,et al.  Symphony: Distributed Hashing in a Small World , 2003, USENIX Symposium on Internet Technologies and Systems.

[52]  Antonio F. Gómez-Skarmeta,et al.  Cyclone: a novel design schema for hierarchical DHTs , 2005, Fifth IEEE International Conference on Peer-to-Peer Computing (P2P'05).

[53]  Andreas Harth,et al.  SAOR: Authoritative Reasoning for the Web , 2008, ASWC.

[54]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[55]  Domenico Talia,et al.  Peer-to-Peer resource discovery in Grids: Models and systems , 2007, Future Gener. Comput. Syst..

[56]  Katia P. Sycara,et al.  Semantic web services: current status and future directions , 2004, Proceedings. IEEE International Conference on Web Services, 2004..

[57]  David Stuart Robertson,et al.  A Lightweight Coordination Calculus for Agent Systems , 2004, DALT.

[58]  Ulf Leser,et al.  Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.

[59]  Heiner Stuckenschmidt,et al.  Towards distributed processing of RDF path queries , 2005, Int. J. Web Eng. Technol..

[60]  Frank van Harmelen,et al.  Knowledge Coordinating Knowledge Sharing through Peer – to – Peer Interaction , 2008 .

[61]  David R. Karger,et al.  Simple Efficient Load-Balancing Algorithms for Peer-to-Peer Systems , 2004, SPAA '04.

[62]  Marco Pistore,et al.  Automated Composition of Semantic Web Services into Executable Processes , 2004, SEMWEB.

[63]  Steffen Staab,et al.  Networked graphs: a declarative mechanism for SPARQL rules, SPARQL views and RDF data integration on the web , 2008, WWW.

[64]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[65]  Katia P. Sycara,et al.  Using DAML-S for P2P Discovery , 2003, International Conference on Web Services.

[66]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[67]  Amit P. Sheth,et al.  Adding Semantics to Web Services Standards , 2003, ICWS.

[68]  Manolis Koubarakis,et al.  Semantic Grid Resource Discovery using DHTs in Atlas , 2006 .

[69]  S. Alpern The Rendezvous Search Problem , 1995 .

[70]  Li Ding,et al.  Characterizing the Semantic Web on the Web , 2006, SEMWEB.

[71]  Dieter Fensel,et al.  Unifying Reasoning and Search to Web Scale , 2007, IEEE Internet Computing.

[72]  Amit P. Sheth,et al.  METEOR-S WSDI: A Scalable P2P Infrastructure of Registries for Semantic Publication and Discovery of Web Services , 2005, Inf. Technol. Manag..

[73]  Karl Aberer,et al.  Distributed cache table: efficient query-driven processing of multi-term queries in P2P networks , 2006, P2PIR '06.

[74]  Heiner Stuckenschmidt,et al.  Distributed Resolution for ALC , 2008, Description Logics.

[75]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[76]  Jérôme Euzenat,et al.  A Survey of Schema-Based Matching Approaches , 2005, J. Data Semant..

[77]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[78]  Peter Druschel,et al.  Pastry: Scalable, distributed object location and routing for large-scale peer-to- , 2001 .

[79]  Andrew S. Tanenbaum,et al.  Distributed systems: Principles and Paradigms , 2001 .

[80]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[81]  D. Dupplaw,et al.  The Open Knowledge Kernel , 2007 .

[82]  David Stuart Robertson,et al.  Peer-to-Peer Experimentation in Protein Structure Prediction: An Architecture, Experiment and Initial Results , 2006, GCCB.

[83]  Steffen Staab,et al.  Bibster - A Semantics-Based Bibliographic Peer-to-Peer System , 2004, SEMWEB.

[84]  Dieter Fensel,et al.  A P2P Discovery mechanism for Web Service Execution Environment , 2005, WIW.

[85]  Spyros Kotoulas,et al.  pRoute: Peer selection using shared term similarity matrices , 2007, Web Intell. Agent Syst..

[86]  Anupriya Ankolekar,et al.  Automated discovery, interaction and composition of Semantic Web services , 2003, J. Web Semant..

[87]  Bruce M. Maggs,et al.  Efficient content location using interest-based locality in peer-to-peer systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[88]  G. Weikum Querying the Internet with PIER , 2005 .

[89]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[90]  Frank van Harmelen,et al.  Configuration of Web Services as Parametric Design , 2004, ECAI.

[91]  Tore Risch,et al.  EDUTELLA: a P2P networking infrastructure based on RDF , 2002, WWW.

[92]  Jennifer Widom,et al.  The TSIMMIS Approach to Mediation: Data Models and Languages , 1997, Journal of Intelligent Information Systems.

[93]  Mike P. Papazoglou,et al.  Leveraging Web-Services and Peer-to-Peer Networks , 2003, CAiSE.

[94]  Anand S. Rao,et al.  Modeling Rational Agents within a BDI-Architecture , 1997, KR.

[95]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[96]  Manolis Koubarakis,et al.  RDFS Reasoning and Query Answering on Top of DHTs , 2008, SEMWEB.

[97]  Serge Mankovskii,et al.  Service Oriented Architecture , 2009, Encyclopedia of Database Systems.

[98]  Carole A. Goble,et al.  myGrid: personalised bioinformatics on the information grid , 2003, ISMB.

[99]  Eyal Oren,et al.  Sindice.com: a document-oriented lookup index for open linked data , 2008, Int. J. Metadata Semant. Ontologies.

[100]  Hector Garcia-Molina,et al.  Designing a super-peer network , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[101]  Min Cai,et al.  RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network , 2004, WWW '04.

[102]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[103]  Ben Y. Zhao,et al.  Towards a Common API for Structured Peer-to-Peer Overlays , 2003, IPTPS.

[104]  Santosh S. Vempala,et al.  Locality-preserving hashing in multidimensional spaces , 1997, STOC '97.

[105]  Spyros Kotoulas,et al.  A scalable architecture for peer privacy on the Web , 2009, SPOT@ESWC.

[106]  David Stuart Robertson,et al.  Multi-agent Coordination as Distributed Logic Programming , 2004, ICLP.

[107]  Ronny Siebes pNear: combining Content Clustering and Distributed Hash Tables , 2005, P2PKM.

[108]  Eyal Oren,et al.  MaRVIN: A platform for large-scale analysis of Semantic Web data , 2009 .

[109]  Scott Shenker,et al.  Internet indirection infrastructure , 2004, IEEE/ACM Transactions on Networking.

[110]  Pat Morin,et al.  Randomized rendezvous with limited memory , 2011, TALG.

[111]  Srinivasan Seshan,et al.  Mercury: supporting scalable multi-attribute range queries , 2004, SIGCOMM '04.

[112]  Ion Stoica,et al.  The Case for a Hybrid P2P Search Infrastructure , 2004, IPTPS.

[113]  Kiyoshi Kogure,et al.  Coordinating Heterogeneous Information Services based on Approximate Ontology Translation , 2002 .

[114]  Spyros Kotoulas,et al.  Scalable discovery of private resources , 2007, 2007 Third International Conference on Security and Privacy in Communications Networks and the Workshops - SecureComm 2007.

[115]  Divyakant Agrawal,et al.  A peer-to-peer framework for Web service discovery with ranking , 2004, Proceedings. IEEE International Conference on Web Services, 2004..