DHT Based Searching Improved by Sliding Window

Efficient full-text searching is a big challenge in Peer-to-Peer (P2P) system. Recently, Distributed Hash Table (DHT) becomes one of the reliable communication schemes for P2P. Some research efforts perform keyword searching and result intersection on DHT substrate. Two or more search requests must be issued for multi-keyword query. This article proposes a Sliding Window improved Multi-keyword Searching method (SWMS) to index and search full-text for short queries on DHT. The main assumptions behind SWMS are: (1) query overhead to do standard inverted list intersection is prohibitive in a distributed P2P system; (2) most of the documents relevant to a multi-keyword query have those keywords appearing near each other. The experimental results demonstrate that our method guarantees the search quality while reduce the cost of communication.

[1]  Ben Y. Zhao,et al.  Tapestry: a fault-tolerant wide-area application infrastructure , 2002, CCRV.

[2]  Charles L. A. Clarke,et al.  Shortest Substring Ranking (MultiText Experiments for TREC-4) , 1995, TREC.

[3]  Berthier A. Ribeiro-Neto,et al.  Query performance for tightly coupled distributed digital libraries , 1998, DL '98.

[4]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[5]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[6]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[7]  Vijay Gopalakrishnan,et al.  Efficient Peer-To-Peer Searches Using Result-Caching , 2003, IPTPS.

[8]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[9]  Hector Garcia-Molina,et al.  Performance of inverted indices in shared-nothing distributed text document information retrieval systems , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[10]  Ibrahim Matta,et al.  BRITE: an approach to universal topology generation , 2001, MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[11]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[12]  Ben Y. Zhao,et al.  An Infrastructure for Fault-tolerant Wide-area Location and Routing , 2001 .

[13]  David R. Karger,et al.  On the Feasibility of Peer-to-Peer Web Indexing and Search , 2003, IPTPS.

[14]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .

[15]  Chris Buckley,et al.  Implementation of the SMART Information Retrieval System , 1985 .

[16]  Jacques Savoy,et al.  Term Proximity Scoring for Keyword-Based Retrieval Systems , 2003, ECIR.

[17]  Omprakash D. Gnawali A Keyword-Set Search System for Peer-to-Peer Networks , 2002 .

[18]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[19]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[20]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[21]  HenzingerMonika,et al.  Analysis of a very large web search engine query log , 1999 .

[22]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[23]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[24]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[25]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[26]  Krishna P. Gummadi,et al.  A measurement study of Napster and Gnutella as examples of peer-to-peer file sharing systems , 2002, CCRV.