Associative search in peer to peer networks: Harnessing latent semantics

The success of a P2P file-sharing network highly depends on the scalability and versatility of its search mechanism. Two particularly desirable search features are scope (ability to find infrequent items) and support for partial-match queries (queries that contain typos or include a subset of keywords). While centralized-index architectures (such as Napster) can support both these features, existing decentralized architectures seem to support at most one: prevailing unstructured P2P protocols (such as Gnutella and FastTrack) deploy a ''blind'' search mechanism where the set of peers probed is unrelated to the query; thus they support partial-match queries but have limited scope. On the other extreme, the recently-proposed distributed hash tables (DHTs) such as CAN and CHORD, couple index location with the item's hash value, and thus have good scope but can not effectively support partial-match queries. Another hurdle to DHTs deployment is their tight control of the overlay structure and the information (part of the index) each peer maintains, which makes them more sensitive to failures and frequent joins and disconnects. We develop a new class of decentralized P2P architectures. Our design is based on unstructured architectures such as Gnutella and FastTrack, and retains many of their appealing properties including support for partial match queries, and relative resilience to peer failures. Yet, we obtain orders of magnitude improvement in the efficiency of locating rare items. Our approach exploits associations inherent in human selections to steer the search process to peers that are more likely to have an answer to the query. We demonstrate the potential of associative search using models, analysis, and simulations.

[1]  Bruce M. Maggs,et al.  Efficient content location using interest-based locality in peer-to-peer systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[2]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[3]  Sam Joseph,et al.  NeuroGrid: Semantically Routing Queries in Peer-to-Peer Networks , 2002, NETWORKING Workshops.

[4]  Hector Garcia-Molina,et al.  SIL: A model for analyzing scalable peer-to-peer search networks , 2006, Comput. Networks.

[5]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002, ICS '02.

[6]  László Lovász,et al.  Approximating Min-sum Set Cover , 2002, APPROX.

[7]  Ben Y. Zhao,et al.  Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and , 2001 .

[8]  Albert L. Lederer,et al.  Nine management guidelines for better cost estimating , 1992, CACM.

[9]  Ravi Kumar,et al.  Extracting Large-Scale Knowledge Bases from the Web , 1999, VLDB.

[10]  Gerhard Weikum,et al.  MINERVA: Collaborative P2P Search , 2005, VLDB.

[11]  Bruce M. Maggs,et al.  Enabling efficient content location and retrieval in peer-to-peer systems by exploiting locality in interests , 2002, CCRV.

[12]  Raouf Boutaba,et al.  Distributed Pattern Matching for P2P Systems , 2006, 2006 IEEE/IFIP Network Operations and Management Symposium NOMS 2006.

[13]  Ravi Kumar,et al.  Topic Distillation and Spectral Filtering , 1999, Artificial Intelligence Review.

[14]  Tim Moors,et al.  Survey of Research towards Robust Peer-to-Peer Networks: Search Methods , 2007, RFC.

[15]  Ben Y. Zhao,et al.  An Infrastructure for Fault-tolerant Wide-area Location and Routing , 2001 .

[16]  Richard A. Harshman,et al.  Indexing by latent semantic indexing , 1990 .

[17]  Jon Crowcroft,et al.  A survey and comparison of peer-to-peer overlay network schemes , 2005, IEEE Communications Surveys & Tutorials.

[18]  Zhichen Xu,et al.  PeerSearch: Efficient Information Retrieval in Peer-to-Peer Networks , 2002 .

[19]  Edith Cohen,et al.  Replication strategies in unstructured peer-to-peer networks , 2002, SIGCOMM.

[20]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[21]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[22]  Edith Cohen,et al.  Efficient sequences of trials , 2003, SODA '03.

[23]  Mihir Bellare,et al.  On Chromatic Sums and Distributed Resource Allocation , 1998, Inf. Comput..

[24]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[25]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[26]  Gurmeet Singh Manku,et al.  SETS: search enhanced by topic segmentation , 2003, SIGIR.

[27]  Rongmei Zhang,et al.  Assisted Peer-to-Peer Search with Partial Indexing , 2007, IEEE Trans. Parallel Distributed Syst..

[28]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[29]  Guy Kortsarz,et al.  A Matched Approximation Bound for the Sum of a Greedy Coloring , 1999, Inf. Process. Lett..

[30]  Zhichen Xu,et al.  pSearch: information retrieval in structured overlays , 2003, CCRV.

[31]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[32]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[33]  Monika Henzinger,et al.  Finding Related Pages in the World Wide Web , 1999, Comput. Networks.

[34]  Ravi Kumar,et al.  Recommendation Systems , 2001 .

[35]  Xiaoyu Yang,et al.  Making search efficient on Gnutella-like P2P systems , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[36]  Antony I. T. Rowstron,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.

[37]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[38]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[39]  Edith Cohen,et al.  Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[40]  Amos Fiat,et al.  Censorship resistant peer-to-peer content addressable networks , 2002, SODA '02.

[41]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[42]  Amos Fiat,et al.  Web search via hub synthesis , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[43]  Tatsuhiro Tsuchiya,et al.  Constructing Byzantine Quorum Systems from Combinatorial Designs , 1999, Inf. Process. Lett..

[44]  Anna R. Karlin,et al.  Spectral analysis of data , 2001, STOC '01.