Flexible Bloom Filters for Searching Textual Objects

Efficient object searching mechanisms are essential in large-scale networks. Many studies have been done on distributed hash tables (DHTs), which are a kind of peer-to-peer system. In DHT networks, we can certainly get the desired objects if they exist. However, multi-word searches generate much communication traffic. Many studies have tried to reduce this traffic by using bloom filters, which are space-efficient probabilistic data structures. In using such filters, all nodes in a DHT must share their false positive rate parameter. However, the best false positive rate differs from one node to another. In this paper, we provide a method of determining the best false positive rate, and we use a new filter called a flexible bloom filter, to which each node can set the approximately best false positive rate. Experiments showed that the flexible bloom filter was able to greatly reduce the traffic.

[1]  Torsten Suel,et al.  Efficient query evaluation on large textual collections in a peer-to-peer environment , 2005, Fifth IEEE International Conference on Peer-to-Peer Computing (P2P'05).

[2]  Ian H. Witten,et al.  Lossless Compression for Text and Images , 1997 .

[3]  William J. Phillips,et al.  A Fixed-Size Bloom Filter for Searching Textual Documents , 1989, Comput. J..

[4]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[5]  David R. Karger,et al.  On the Feasibility of Peer-to-Peer Web Indexing and Search , 2003, IPTPS.

[6]  Antony I. T. Rowstron,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.

[7]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[8]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[9]  M. Mitzenmacher,et al.  Parallel randomized load balancing , 1998 .

[10]  Donald E. Eastlake,et al.  US Secure Hash Algorithm 1 (SHA1) , 2001, RFC.

[11]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[12]  James K. Mullin,et al.  Accessing Textual Documents Using Compressed Indexes of Arrays of Small Bloom Filters , 1987, Comput. J..

[13]  Ion Stoica,et al.  Peer-to-Peer Systems II , 2003, Lecture Notes in Computer Science.

[14]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[15]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM 2001.

[16]  Yuichi Sei,et al.  An Algorithm to Reduce the Communication Traffic for Multi-Word Searches in a Distributed Hash Table , 2006, IFIP TCS.

[17]  Michael Mitzenmacher,et al.  Compressed bloom filters , 2001, PODC '01.