A case for associative peer to peer overlays

The success of a P2P file-sharing network highly depends on the scalability and versatility of its search mechanism. Two particularly desirable search features are scope (ability to find infrequent items) and support for partial-match queries (queries that contain typos or include a subset of keywords). While centralized-index architectures (such as Napster) can support both these features, existing decentralized architectures seem to support at most one: prevailing protocols (such as Gnutella and FastTrack) support partial-match queries, but since search is unrelated to the query, they have limited scope. Distributed Hash Tables (such as CAN and CHORD) constitute another class of P2P architectures promoted by the research community. DHTs couple index location with the item's hash value and are able to provide scope but can not effectively support partial-match queries; another hurdle in DHT deployment is their tight control the overlay structure and data placement which makes them more sensitive to failures.Associative overlays are a new class of decentralized P2P architectures. They are designed as a collection of unstructured P2P networks (based on popular architectures such as gnutella or FastTrack), and the design retains many of their appealing properties including support for partial match queries, and relative resilience to peer failures. Yet, the search process is orders of magnitude more effective in locating rare items. Our design exploits associations inherent in human selections to steer the search process to peers that are more likely to have an answer to the query.

[1]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[2]  Monika Henzinger,et al.  Finding Related Pages in the World Wide Web , 1999, Comput. Networks.

[3]  Anna R. Karlin,et al.  Spectral analysis of data , 2001, STOC '01.

[4]  Mischa Schwartz,et al.  ACM SIGCOMM computer communication review , 2001, CCRV.

[5]  Richard A. Harshman,et al.  Indexing by latent semantic indexing , 1990 .

[6]  Ravi Kumar,et al.  Recommendation systems: a probabilistic analysis , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[7]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[8]  Zhichen Xu,et al.  PeerSearch: Efficient Information Retrieval in Peer-to-Peer Networks , 2002 .

[9]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[10]  A. Tomkins,et al.  Spectral filtering for resource discovery , 1998 .

[11]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[12]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[13]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002, ICS '02.

[14]  Ravi Kumar,et al.  Extracting Large-Scale Knowledge Bases from the Web , 1999, VLDB.

[15]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[16]  Ben Y. Zhao,et al.  Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and , 2001 .

[17]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[18]  Edith Cohen,et al.  Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[19]  Amos Fiat,et al.  Censorship resistant peer-to-peer content addressable networks , 2002, SODA '02.

[20]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[21]  Bruce M. Maggs,et al.  Enabling efficient content location and retrieval in peer-to-peer systems by exploiting locality in interests , 2002, CCRV.

[22]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[23]  Edith Cohen,et al.  Replication strategies in unstructured peer-to-peer networks , 2002, SIGCOMM.

[24]  Ben Y. Zhao,et al.  An Infrastructure for Fault-tolerant Wide-area Location and Routing , 2001 .

[25]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[26]  Antony I. T. Rowstron,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.

[27]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.