Assisted Peer-to-Peer Search with Partial Indexing

In the past few years, peer-to-peer (P2P) networks have become a promising paradigm for building a wide variety of distributed systems and applications. The most popular P2P application till today is file sharing, e.g., Gnutella, Kazza, etc. These systems are usually referred to as unstructured, and search in unstructured P2P networks usually involves flooding or random walking. On the other hand, in structured P2P networks (DHTs), search is usually performed by looking up a distributed inverted index. The efficiency of the search mechanism is the key to the scalability of a P2P content sharing system. So far, neither unstructured nor structured P2P networks alone can solve the search problem in a satisfactory way. In this paper, we propose to combine the strengths of both unstructured and structured P2P networks to achieve more efficient search. Specifically, we propose to enhance search in unstructured P2P overlay networks by building a partial index of shared data using a structured P2P network. The index maintains two types of information: the top interests of peers and globally unpopular data, both characterized by data properties. The proposed search protocol, assisted search with partial indexing, makes use of the index to improve search in three ways: first, the index assists peers to find other peers with similar interests and the unstructured search overlay is formed to reflect peer interests. Second, the index also provides search hints for those data difficult to locate by exploring peer interest locality, and these hints can be used for second-chance search. Third, the index helps to locate unpopular data items. Experiments based on a P2P file sharing trace show that the assisted search with a lightweight partial indexing service can significantly improve the success rate in locating data than Gnutella and a hit-rate-based protocol in unstructured P2P systems, while incurring low search latency and overheads.

[1]  David R. Karger,et al.  On the Feasibility of Peer-to-Peer Web Indexing and Search , 2003, IPTPS.

[2]  Miguel Castro,et al.  SplitStream: High-Bandwidth Content Distribution in Cooperative Environments , 2003, IPTPS.

[3]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002 .

[4]  Christos Gkantsidis,et al.  Hybrid search schemes for unstructured peer-to-peer networks , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[5]  Ben Y. Zhao,et al.  An Infrastructure for Fault-tolerant Wide-area Location and Routing , 2001 .

[6]  Miguel Castro,et al.  Should we build Gnutella on a structured overlay? , 2004, Comput. Commun. Rev..

[7]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[8]  Bruce M. Maggs,et al.  Efficient content location using interest-based locality in peer-to-peer systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[9]  John Kubiatowicz,et al.  Probabilistic location and routing , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[10]  Edith Cohen,et al.  Replication strategies in unstructured peer-to-peer networks , 2002, SIGCOMM.

[11]  Miguel Castro,et al.  Scribe: a large-scale and decentralized application-level multicast infrastructure , 2002, IEEE J. Sel. Areas Commun..

[12]  Ben Y. Zhao,et al.  Bayeux: an architecture for scalable and fault-tolerant wide-area data dissemination , 2001, NOSSDAV '01.

[13]  Sandhya Dwarkadas,et al.  Hybrid Global-Local Indexing for Efficient Peer-to-Peer Information Retrieval , 2004, NSDI.

[14]  Y. Charlie Hu,et al.  Borg: a hybrid protocol for scalable application-level multicast in peer-to-peer networks , 2003, NOSSDAV '03.

[15]  Edith Cohen,et al.  Associative search in peer to peer networks: harnessing latent semantics , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[16]  Li Xiao,et al.  Distributed caching and adaptive search in multilayer P2P networks , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[17]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[18]  Hector Garcia-Molina,et al.  Improving search in peer-to-peer networks , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[19]  Miguel Castro,et al.  Controlling the Cost of Reliability in Peer-to-Peer Overlays , 2003, IPTPS.

[20]  Robert Tappan Morris,et al.  Ivy: a read/write peer-to-peer file system , 2002, OSDI '02.

[21]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[22]  Ian T. Foster,et al.  Small-world file-sharing communities , 2003, IEEE INFOCOM 2004.

[23]  Hector Garcia-Molina,et al.  YAPPERS: a peer-to-peer lookup service over arbitrary topology , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[24]  Hector Garcia-Molina,et al.  Designing a super-peer network , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[25]  Hector Garcia-Molina,et al.  Semantic Overlay Networks for P2P Systems , 2004, AP2PC.

[26]  P. Oscar Boykin,et al.  Scalable Percolation Search in Power Law Networks , 2004, ArXiv.

[27]  Miguel Castro,et al.  Debunking some myths about structured and unstructured overlays , 2005, NSDI.

[28]  Li Xiao,et al.  Exploiting Content Localities for Efficient Search in P2P Systems , 2004, DISC.

[29]  Y. Charlie Hu,et al.  Transparent query caching in peer-to-peer overlay networks , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[30]  Scott Shenker,et al.  Making gnutella-like P2P systems scalable , 2003, SIGCOMM '03.

[31]  Zhichen Xu,et al.  pSearch: information retrieval in structured overlays , 2003, CCRV.

[32]  John Kubiatowicz,et al.  Handling churn in a DHT , 2004 .

[33]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM 2001.

[34]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[35]  Mark Handley,et al.  Application-Level Multicast Using Content-Addressable Networks , 2001, Networked Group Communication.

[36]  Antony I. T. Rowstron,et al.  PAST: a large-scale, persistent peer-to-peer storage utility , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[37]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[38]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[39]  Ion Stoica,et al.  The Case for a Hybrid P2P Search Infrastructure , 2004, IPTPS.

[40]  Ian T. Foster,et al.  Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design , 2002, ArXiv.

[41]  Hector Garcia-Molina,et al.  Improving Search in Peer-to-Peer Systems , 2001 .

[42]  Anand Sivasubramaniam,et al.  Semantic small world: an overlay network for peer-to-peer search , 2004, Proceedings of the 12th IEEE International Conference on Network Protocols, 2004. ICNP 2004..

[43]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[44]  Hector Garcia-Molina,et al.  Partial lookup services , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..