Network Applications of Bloom Filters: A Survey

A Bloom filter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Bloom filters allow false positives but the space savings often outweigh this drawback when the probability of an error is controlled. Bloom filters have been used in database applications since the 1970s, but only in recent years have they become popular in the networking literature. The aim of this paper is to survey the ways in which Bloom filters have been used and modified in a variety of network problems, with the aim of providing a unified mathematical and practical framework for understanding them and stimulating their use in future applications.

[1]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[2]  Michael Mitzenmacher,et al.  Compressed bloom filters , 2002, TNET.

[3]  James K. Mullin,et al.  Optimal Semijoins for Distributed Database Systems , 1990, IEEE Trans. Software Eng..

[4]  John Kubiatowicz,et al.  Probabilistic location and routing , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[5]  Lee L. Gremillion Designing a Bloom filter for differential file access , 1982, CACM.

[6]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[7]  Ben Y. Zhao,et al.  An architecture for a secure service discovery service , 1999, MobiCom.

[8]  H BloomBurton Space/time trade-offs in hash coding with allowable errors , 1970 .

[9]  Patrick Valduriez,et al.  Join and Semijoin Algorithms for a Multiprocessor Database Machine , 1984, TODS.

[10]  Duane Wessels,et al.  Cache Digests , 1998, Comput. Networks.

[11]  James K. Mullin,et al.  A second look at bloom filters , 1983, CACM.

[12]  Jeffrey Considine,et al.  Informed content delivery across adaptive overlay networks , 2002, IEEE/ACM Transactions on Networking.

[13]  David R. Karger,et al.  A scalable location service for geographic ad hoc routing , 2000, MobiCom '00.

[14]  Abhishek Kumar,et al.  Space-code bloom filter for efficient traffic flow measurement , 2003, IMC '03.

[15]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[16]  Margo I. Seltzer,et al.  Self-organization in peer-to-peer systems , 2002, EW 10.

[17]  Alex C. Snoeren,et al.  Hash-based IP traceback , 2001, SIGCOMM '01.

[18]  Udi Manber,et al.  An Algorithm for Approximate Membership checking with Application to Password Security , 1994, Inf. Process. Lett..

[19]  David Wetherall,et al.  Forwarding without loops in Icarus , 2002, 2002 IEEE Open Architectures and Network Programming Proceedings. OPENARCH 2002 (Cat. No.02EX571).

[20]  James K. Mullin Estimating the size of a relational join , 1993, Inf. Syst..

[21]  Devdatt P. Dubhashi,et al.  Balls and bins: A study in negative dependence , 1996, Random Struct. Algorithms.

[22]  Antony I. T. Rowstron,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.

[23]  M. Douglas,et al.  Development of a Spelling List , 1982 .

[24]  Antony I. T. Rowstron,et al.  PAST: a large-scale, persistent peer-to-peer storage utility , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[25]  Eugene H. Spafford,et al.  OPUS: Preventing weak password choices , 1992, Comput. Secur..

[26]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[27]  Bernard Chazelle,et al.  The Bloomier filter: an efficient data structure for static support lookup tables , 2004, SODA '04.

[28]  Björn Grönvall Scalable multicast forwarding , 2002, CCRV.

[29]  James K. Mullin,et al.  A tale of three spelling checkers , 1990, Softw. Pract. Exp..

[30]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[31]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[32]  Guy M. Lohman,et al.  Optimizer Validation and Performance Evaluation for Distributed Queries , 1998 .

[33]  Desh Ranjan,et al.  Balls and bins: A study in negative dependence , 1996, Random Struct. Algorithms.

[34]  M. V. Ramakrishna,et al.  Practical performance of Bloom filters and parallel free-text searching , 1989, CACM.

[35]  Krishna Bharat,et al.  Supporting cooperative and personal surfing with a desktop assistant , 1997, UIST '97.

[36]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[37]  Kang G. Shin,et al.  Stochastic fair blue: a queue management algorithm for enforcing fairness , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[38]  Kjell Bratbergsengen,et al.  Hashing Methods and Relational Algebra Operations , 1984, VLDB.

[39]  Kenneth A. Ross,et al.  PERF join: an alternative to two-way semijoin and bloomjoin , 1995, CIKM '95.

[40]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[41]  Yossi Matias,et al.  Spectral bloom filters , 2003, SIGMOD '03.

[42]  Jeffrey Considine,et al.  Fast Approximate Reconciliation of Set Differences , 2002 .

[43]  Richard P. Martin,et al.  PlanetP: using gossiping to build content addressable peer-to-peer information sharing communities , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[44]  Cristian Estan,et al.  New directions in traffic measurement and accounting , 2001, IMW '01.

[45]  Michael Mitzenmacher,et al.  Digital Fountains and Their Application to Informed Content Delivery over Adaptive Overlay Networks , 2005, DISC.

[46]  George Varghese,et al.  New directions in traffic measurement and accounting , 2002, SIGCOMM '02.