Efficient search in file-sharing networks

Currently, the most popular file-sharing applications have used either centralized or flood-based search algorithms. Napster has been successful in providing a centralized index with presumably perfect query recall. Succeeding, more distributed protocols like Gnutella and Kazaa have used flood-based search procedures in which a query is propagated through an unstructured network. Query result quality in such networks suffers as queries for items that are rare have high probabilities of not being found as the entire corpus is not covered during a search. In this paper, we present a new and improved implementation of a distributed file-sharing system yielding (1) query result quality better than flooding and close to a centralized index, and (2) low-maintenance network overhead. These improvements result from our optimized approaches to (a) high churn rates (clients and servers frequently entering and leaving the system) and (b) skewed workloads (high variation in access frequencies vs. key). High churn rates are addressed by keeping all data in soft state, which is periodically refreshed, such that the loss of a server or client is quickly reflected in the indexes; higher refresh rates imply fewer false positives. Skewed workloads are load balanced with the use of a layer of indirection for placing and locating data, such that data is partitioned and distributed based on the frequency of use. A trace-driven prototype evaluation based on Gnutella system traces shows that our prototype implementation achieves a low network bandwidth, attains max-average load ratios within a factor of three across all servers, and has positive recall values for over 90% of all queries, despite a high churn rate; the recall would be 100% absent churn.

[1]  David R. Karger,et al.  On the Feasibility of Peer-to-Peer Web Indexing and Search , 2003, IPTPS.

[2]  Mary Baker,et al.  CUP: Controlled Update Propagation in Peer-to-Peer Networks , 2003, USENIX Annual Technical Conference, General Track.

[3]  Brian Neil Levine,et al.  An evaluation of chord using traces of peer-to-peer file sharing , 2004, SIGMETRICS '04/Performance '04.

[4]  Miguel Castro,et al.  Should we build Gnutella on a structured overlay? , 2004, Comput. Commun. Rev..

[5]  Brighten Godfrey,et al.  OpenDHT: a public DHT service and its uses , 2005, SIGCOMM '05.

[6]  Edith Cohen,et al.  Replication strategies in unstructured peer-to-peer networks , 2002, SIGCOMM.

[7]  Scott Shenker,et al.  Making gnutella-like P2P systems scalable , 2003, SIGCOMM '03.

[8]  Ben Y. Zhao,et al.  An Infrastructure for Fault-tolerant Wide-area Location and Routing , 2001 .

[9]  Donald F. Towsley,et al.  Modeling peer-peer file sharing systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[10]  Scott Shenker,et al.  Enhancing P2P File-Sharing with an Internet-Scale Query Processor , 2004, Very Large Data Bases Conference.

[11]  John Kubiatowicz,et al.  Handling churn in a DHT , 2004 .

[12]  Sandhya Dwarkadas,et al.  Hybrid Global-Local Indexing for Efficient Peer-to-Peer Information Retrieval , 2004, NSDI.

[13]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002, ICS '02.

[14]  Krishna P. Gummadi,et al.  Measurement, modeling, and analysis of a peer-to-peer file-sharing workload , 2003, SOSP '03.

[15]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[16]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[17]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM 2001.

[18]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[19]  Krishna P. Gummadi,et al.  A measurement study of Napster and Gnutella as examples of peer-to-peer file sharing systems , 2002, CCRV.

[20]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[21]  Hector Garcia-Molina,et al.  Improving search in peer-to-peer networks , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[22]  Rongmei Zhang,et al.  Assisted Peer-to-Peer Search with Partial Indexing , 2007, IEEE Trans. Parallel Distributed Syst..

[23]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[24]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[25]  Mary K. Vernon,et al.  Characterizing the query behavior in peer-to-peer file sharing systems , 2004, IMC '04.

[26]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.