The Bloomier filter: an efficient data structure for static support lookup tables

We introduce the Bloomier filter, a data structure for compactly encoding a function with static support in order to support approximate evaluation queries. Our construction generalizes the classical Bloom filter, an ingenious hashing scheme heavily used in networks and databases, whose main attribute---space efficiency---is achieved at the expense of a tiny false-positive rate. Whereas Bloom filters can handle only set membership queries, our Bloomier filters can deal with arbitrary functions. We give several designs varying in simplicity and optimality, and we provide lower bounds to prove the (near) optimality of our constructions.

[1]  James K. Mullin,et al.  Optimal Semijoins for Distributed Database Systems , 1990, IEEE Trans. Software Eng..

[2]  John Kubiatowicz,et al.  Probabilistic location and routing , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[3]  Kang G. Shin,et al.  Stochastic fair blue: a queue management algorithm for enforcing fairness , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[4]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[5]  Cristian Estan,et al.  New directions in traffic measurement and accounting , 2001, IMW '01.

[6]  D. Spielman,et al.  Expander codes , 1996 .

[7]  Alexander A. Razborov,et al.  Applications of matrix methods to the theory of lower bounds in computational complexity , 1990, Comb..

[8]  Craig Partridge,et al.  Hash-based IP traceback , 2001, SIGCOMM.

[9]  Michael Mitzenmacher,et al.  Compressed bloom filters , 2001, PODC '01.

[10]  Lee L. Gremillion Designing a Bloom filter for differential file access , 1982, CACM.

[11]  M. D. McIlroy,et al.  Development of a Spelling List , 1982, IEEE Trans. Commun..

[12]  János Komlós,et al.  Storing a sparse table with O(1) worst case access time , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[13]  J. Ian Munro,et al.  Membership in Constant Time and Almost-Minimum Space , 1999, SIAM J. Comput..

[14]  Kenneth A. Ross,et al.  PERF join: an alternative to two-way semijoin and bloomjoin , 1995, CIKM '95.

[15]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[16]  Yossi Matias,et al.  Spectral bloom filters , 2003, SIGMOD '03.

[17]  Richard P. Martin,et al.  PlanetP: using gossiping to build content addressable peer-to-peer information sharing communities , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[18]  Michael Luby,et al.  LT codes , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[19]  Duane Wessels,et al.  Cache Digests , 1998, Comput. Networks.

[20]  Jeffrey Considine,et al.  Informed content delivery across adaptive overlay networks , 2002, IEEE/ACM Transactions on Networking.

[21]  David Wetherall,et al.  Forwarding without loops in Icarus , 2002, 2002 IEEE Open Architectures and Network Programming Proceedings. OPENARCH 2002 (Cat. No.02EX571).

[22]  Omer Reingold,et al.  Randomness Conductors and Constant-Degree Expansion Beyond the Degree / 2 Barrier , 2001 .

[23]  János Komlós,et al.  Storing a sparse table with O(1) worst case access time , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[24]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.

[25]  Avi Wigderson,et al.  Randomness conductors and constant-degree lossless expanders , 2002, STOC '02.

[26]  Udi Manber,et al.  An Algorithm for Approximate Membership checking with Application to Password Security , 1994, Inf. Process. Lett..

[27]  Peter Bro Miltersen,et al.  Are bitvectors optimal? , 2000, STOC '00.

[28]  Joel H. Spencer,et al.  Families of k-independent sets , 1973, Discret. Math..

[29]  Michael Mitzenmacher Digital Fountains and Their Application to Informed Content Delivery over Adaptive Overlay Networks , 2005, DISC.

[30]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[31]  Pai-Hsiang Hsiao,et al.  Geographical Region Summary Service for geographical routing , 2001, MOCO.

[32]  Margo I. Seltzer,et al.  Self-organization in peer-to-peer systems , 2002, EW 10.