Proof: A DHT-Based Peer-to-Peer Search Engine

In this paper we focus on building a large scale keyword search service over structured peer-to-peer (P2P) networks. Current state-of-the-art keyword search approaches for structured P2P systems are based on inverted list intersection. However, the biggest challenge in those approaches is that when the indices are distributed over peers, a simple query may cause a large amount of data to be transmitted over the network. We propose a new P2P keyword search scheme, called "Proof", to reduce network traffic for queries. The key idea is storing a content summary for each Web page in the inverted list, so that a query can be processed by only transmitting a small size of candidate results. Our simulation results showed that, compared with previous DHT-based P2P systems, Proof can dramatically reduce network traffic and computation time. It provides 100% precision and 90.09% recall of search results, at an acceptable cost of storage overhead, even when the number of peers and documents increases continually

[1]  James C. Browne,et al.  Distributed pagerank for P2P systems , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[2]  Yuh-Jzer Joung,et al.  Keyword Search in DHT-Based Peer-to-Peer Networks , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[3]  David R. Karger,et al.  On the Feasibility of Peer-to-Peer Web Indexing and Search , 2003, IPTPS.

[4]  Sandhya Dwarkadas,et al.  Hybrid Global-Local Indexing for Efficient Peer-to-Peer Information Retrieval , 2004, NSDI.

[5]  Brian F. Cooper An Optimal Overlay Topology for Routing Peer-to-Peer Searches , 2005, Middleware.

[6]  Abhishek Kumar,et al.  Efficient and scalable query routing for unstructured peer-to-peer networks , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[7]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[8]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[9]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[10]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[11]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[12]  Zhichen Xu,et al.  pSearch: information retrieval in structured overlays , 2003, CCRV.

[13]  Jan-Ming Ho,et al.  AntSearch: An Ant Search Algorithm in Unstructured Peer-to-Peer Networks , 2006, ISCC.

[14]  Hector Garcia-Molina,et al.  Designing a super-peer network , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[15]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[16]  Scott Shenker,et al.  Making gnutella-like P2P systems scalable , 2003, SIGCOMM '03.

[17]  Omprakash D. Gnawali A Keyword-Set Search System for Peer-to-Peer Networks , 2002 .

[18]  Zhichen Xu,et al.  pFilter: global information filtering and dissemination using structured overlay networks , 2003, The Ninth IEEE Workshop on Future Trends of Distributed Computing Systems, 2003. FTDCS 2003. Proceedings..

[19]  Yong Yang,et al.  Performance of Full Text Search in Structured and Unstructured Peer-to-Peer Systems , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.