Information Integration in Schema-Based Peer-To-Peer Networks

Peer-to-peer (P2P) networks have become an important infrastructure during the last years. Using P2P networks for distributed information systems allows us to shift the focus from centrally organized to distributed information systems where all peers can provide and have access to information. In previous papers, we have described an RDF-based P2P infrastructure called Edutella which is a specific example of a more advanced approach to P2P networks called schema-based peer-to-peer networks. Schema-based P2P networks have a number of advantages compared with simpler P2P networks such as Napster or Gnutella. Instead of prescribing one global schema to describe content, they support arbitrary metadata schemas and ontologies (crucial for the Semantic Web). Thereby they allow complex and extendable descriptions of resources thus introducing dynamic behavior to the former fixed and limited descriptions, and can provide complex query facilities against these metadata instead of simple keyword-based searches. In this paper we will elaborate topologies, indices and query routing strategies for efficient query distribution in such networks. Our work is based on the concept of super-peer networks which provide better scalability compared to traditional P2P networks. By adapting existing concepts of mediator-based information systems to super-peer based networks, as we will showin this paper, they are able to support sophisticated routing, clustering and mediation strategies based on the metadata schemas and attributes. The resulting routing indices can be built using local clustering policies and support local mediation and transformation rules between heterogeneous schemas, and we sketch some first ideas for implementing these advanced functionalities as well.

[1]  Wolfgang Nejdl,et al.  Super-peer-based routing and clustering strategies for RDF-based peer-to-peer networks , 2003, WWW '03.

[2]  Diane M. Strong,et al.  Data quality in context , 1997, CACM.

[3]  Hector Garcia-Molina,et al.  Efficient search in peer to peer networks , 2004 .

[4]  Tore Risch,et al.  EDUTELLA: a P2P networking infrastructure based on RDF , 2002, WWW.

[5]  P ShethAmit,et al.  Federated database systems for managing distributed, heterogeneous, and autonomous databases , 1990 .

[6]  Felix Naumann,et al.  Quality-Driven Query Answering for Integrated Information Systems , 2002, Lecture Notes in Computer Science.

[7]  Ben Y. Zhao,et al.  An Infrastructure for Fault-tolerant Wide-area Location and Routing , 2001 .

[8]  Fausto Giunchiglia,et al.  Data Management for Peer-to-Peer Computing : A Vision , 2002, WebDB.

[9]  Ben Y. Zhao,et al.  Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and , 2001 .

[10]  Wolfgang Nejdl,et al.  HyperCuP - Hypercubes, Ontologies, and Efficient Search on Peer-to-Peer Networks , 2002, AP2PC.

[11]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[12]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[13]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[14]  Hector Garcia-Molina,et al.  Improving search in peer-to-peer networks , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[15]  Karl Aberer,et al.  The chatty web: emergent semantics through gossiping , 2003, WWW '03.

[16]  Brandon Muramatsu,et al.  Draft Standard for Learning Object Metadata , 2002 .

[17]  Manfred Hauswirth,et al.  Semantic Gossiping , 2002 .

[18]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[19]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.

[20]  George Varghese,et al.  Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications , 2001, SIGCOMM 2001.

[21]  Ulf Leser,et al.  Query planning in mediator based information systems , 2000 .

[22]  Hector Garcia-Molina,et al.  Designing a super-peer network , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[23]  Stefan Decker,et al.  Ontologies and efficient search on p2p networks , 2002 .

[24]  Renée J. Miller,et al.  Mapping data in peer-to-peer systems: semantics and algorithmic issues , 2003, SIGMOD '03.

[25]  Jennifer Widom,et al.  Active Database Systems , 1995, Modern Database Systems.