Querying structured data in an unstructured P2P system

Peer-to-Peer networking has become a major research topic over the last few years. Sharing of structured data in such decentralized environments is a challenging problem, especially in the absence of a global schema. The standard practice of answering a query that is consecutively rewritten along the propagation path often results in significant loss of information. In this paper, we present an adaptive and bandwidth-efficient solution to the problem in the context of an unstructured, purely decentralized system. Our method allows peers to individually choose which rewritten version of a query to answer and discover information-rich sources left hidden otherwise. Utilizing normal query traffic only, we describe how efficient query routing and clustering of peers can be used to produce high quality answers. Simulation results show that our technique is both effective and bandwidth-efficient in a variety of workloads and network sizes.

[1]  Ian T. Foster,et al.  Mapping the Gnutella Network: Macroscopic Properties of Large-Scale Peer-to-Peer Systems , 2002, IPTPS.

[2]  Verena Kantere,et al.  Coordinating Peer Databases Using ECA Rules , 2003, DBISP2P.

[3]  Beng Chin Ooi,et al.  PeerDB: a P2P-based system for distributed data sharing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[4]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[5]  Renée J. Miller,et al.  Data mapping in peer-to-peer systems: Semantics and algorithmic issues , 2003, SIGMOD 2003.

[6]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[7]  Alon Y. Halevy,et al.  Efficient query reformulation in peer data management systems , 2004, SIGMOD '04.

[8]  Renée J. Miller,et al.  Mapping data in peer-to-peer systems: semantics and algorithmic issues , 2003, SIGMOD '03.

[9]  Ibrahim Matta,et al.  BRITE: an approach to universal topology generation , 2001, MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[10]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[11]  Sugih Jamin,et al.  Inet: Internet Topology Generator , 2000 .