Scalable summary based retrieval in P2P networks

Much of the present P2P-IR literature is focused on distributed indexing structures. Within this paper, we present an approach based on the replication of peer data summaries via rumor spreading and multicast in a structured overlay.We will describe Rumorama, a P2P framework for similar-ity queries inspired by GlOSS and CORI and their P2P-adaptation, PlanetP. Rumorama achieves a hierarchization of PlanetP-like summary-based P2P-IR networks. In a Rumorama network, each peer views the network as a small PlanetP network with connections to peers that see other small PlanetP networks. One important aspect is that each peer can choose the size of the PlanetP network it wants to see according to its local processing power and bandwidth. Even in this adaptive environment, Rumorama manages to process a query such that the summary of each peer is considered exactly once in a network without churn. However, the actual number of peers to be contacted for a query is a small fraction of the total number of peers in the network.Within this article, we present the Rumorama base protocol, as well as experiments demonstrating the scalability and viability of the approach under churn.

[1]  David R. Karger,et al.  On the Feasibility of Peer-to-Peer Web Indexing and Search , 2003, IPTPS.

[2]  Bryce Wilcox-O ' Hearn Experiences Deploying a Large-Scale Emergent Network , 2002 .

[3]  Wolfgang Müller,et al.  Fast retrieval of high-dimensional feature vectors in P2P networks using compact peer data summaries , 2003, MIR '03.

[4]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[5]  Luis Gravano,et al.  GlOSS: text-source discovery over the Internet , 1999, TODS.

[6]  Karl Aberer,et al.  Updates in highly unreliable, replicated peer-to-peer systems , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[7]  Sandhya Dwarkadas,et al.  Hybrid Global-Local Indexing for Efficient Peer-to-Peer Information Retrieval , 2004, NSDI.

[8]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[9]  Richard P. Martin,et al.  Autonomous replication for high availability in unstructured P2P systems , 2003, 22nd International Symposium on Reliable Distributed Systems, 2003. Proceedings..

[10]  Wolfgang Nejdl,et al.  Super-peer-based routing and clustering strategies for RDF-based peer-to-peer networks , 2003, WWW '03.

[11]  Richard M. Karp,et al.  Randomized rumor spreading , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[12]  Luis Gravano,et al.  The Effectiveness of GlOSS for the Text Database Discovery Problem , 1994, SIGMOD Conference.

[13]  David Mazières,et al.  Kademlia: A Peer-to-Peer Information System Based on the XOR Metric , 2002, IPTPS.

[14]  Zhichen Xu,et al.  pSearch: information retrieval in structured overlays , 2003, CCRV.

[15]  Scott Shenker,et al.  Querying the Internet with PIER , 2003, VLDB.

[16]  Thu D. Nguyen,et al.  Text-Based Content Search and Retrieval in Ad-hoc P2P Communities , 2002, NETWORKING Workshops.

[17]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.