Towards feasibility and scalability of text search in peer-to-peer systems

In this paper, we introduce a search engine, Dgoogle, designed for large scale P2P systems. Dgoogle is purely text- based, does not organize documents based on pre-defined keywords or based on their semantics. It is simple to implement and can tolerate variations in the wording of text queries. Compared to existing proposals, such as Inverted Indices, Dgoogle does not stress network bandwidth and offers an order of magnitude of savings in storage overhead and in the response time to user queries. Furthermore, Dgoogle's performance is not affected by long queries or by processing popular query words. Simulation results validate the efficacy of our proposal.

[1]  Karl Aberer,et al.  ALVIS peers: a scalable full-text peer-to-peer retrieval engine , 2006, P2PIR '06.

[2]  Hector Garcia-Molina,et al.  Semantic Overlay Networks for P2P Systems , 2004, AP2PC.

[3]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[4]  David R. Karger,et al.  Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems , 2004, IPTPS.

[5]  Edith Cohen,et al.  Associative search in peer to peer networks: harnessing latent semantics , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[6]  Torsten Suel,et al.  Efficient query evaluation on large textual collections in a peer-to-peer environment , 2005, Fifth IEEE International Conference on Peer-to-Peer Computing (P2P'05).

[7]  Omprakash D. Gnawali A Keyword-Set Search System for Peer-to-Peer Networks , 2002 .

[8]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[9]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[10]  Gurmeet Singh Manku,et al.  Decentralized algorithms using both local and random probes for P2P load balancing , 2005, SPAA '05.

[11]  Gerhard Weikum,et al.  Improving collection selection with overlap awareness in P2P search engines , 2005, SIGIR '05.

[12]  Craig Silverstein,et al.  Analysis of a Very Large Altavista Query Log" SRC Technical note #1998-14 , 1998 .

[13]  Bruce M. Maggs,et al.  Efficient content location using interest-based locality in peer-to-peer systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[14]  Richard P. Martin,et al.  PlanetP: using gossiping to build content addressable peer-to-peer information sharing communities , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[15]  Torsten Suel,et al.  ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval , 2003, WebDB.

[16]  Hector Garcia-Molina,et al.  Improving search in peer-to-peer networks , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[17]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[18]  Dmitri Loguinov,et al.  Graph-theoretic analysis of structured peer-to-peer systems: routing distances and fault resilience , 2003, IEEE/ACM Transactions on Networking.

[19]  Malcolm C. Harrison,et al.  Implementation of the substring test by hashing , 1971, CACM.

[20]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[21]  George Giakkoupis,et al.  A scheme for load balancing in heterogenous distributed hash tables , 2005, PODC '05.

[22]  David R. Karger,et al.  On the Feasibility of Peer-to-Peer Web Indexing and Search , 2003, IPTPS.

[23]  Dimitrios Gunopulos,et al.  A local search mechanism for peer-to-peer networks , 2002, CIKM '02.

[24]  Hector Garcia-Molina,et al.  Efficient search in peer to peer networks , 2004 .

[25]  Donald E. Knuth,et al.  The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition , 1997 .

[26]  Ben Y. Zhao,et al.  An Infrastructure for Fault-tolerant Wide-area Location and Routing , 2001 .

[27]  Donald E. Knuth,et al.  The art of computer programming: V.1.: Fundamental algorithms , 1997 .

[28]  Alan L. Tharp File organization and processing , 1988 .