The Bitwise Bloom Filter

We present the Bitwise Bloom Filter, a data structure for maintaining counts for a large number of items. The bitwise filter is an extension of the Bloom filter, a space-ecient data structure for storing a large set eciently by discarding the identity of the items being held while still being able to determine whether it is in the set or not, with high probability. We show how this idea can be extended to maintaining counts of items by maintaining a separate Bloom filter for every position in the bit representations of all the counts. We give both theoretical analysis of the accuracy of the Bitwise filter together with validation via experiments on real network data.

[1]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.

[2]  David P. Woodruff Optimal space lower bounds for all frequency moments , 2004, SODA '04.

[3]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[4]  Yossi Matias,et al.  DIMACS Series in Discrete Mathematicsand Theoretical Computer Science Synopsis Data Structures for Massive Data , 2007 .

[5]  Bernard Chazelle,et al.  The Bloomier filter: an efficient data structure for static support lookup tables , 2004, SODA '04.

[6]  Yossi Matias,et al.  Spectral bloom filters , 2003, SIGMOD '03.

[7]  Graham Cormode,et al.  What's new: finding significant differences in network data streams , 2004, INFOCOM 2004.

[8]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[9]  Abhishek Kumar,et al.  Space-code bloom filter for efficient traffic flow measurement , 2003, IMC '03.

[10]  George Varghese,et al.  New directions in traffic measurement and accounting , 2002, SIGCOMM '02.

[11]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[12]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[13]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[14]  David P. Woodruff,et al.  Tight lower bounds for the distinct elements problem , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[15]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[16]  Jun Xu,et al.  IP Traceback-Based Intelligent Packet Filtering: A Novel Technique for Defending against Internet DDoS Attacks , 2003, IEEE Trans. Parallel Distributed Syst..

[17]  Martin Raab,et al.  "Balls into Bins" - A Simple and Tight Analysis , 1998, RANDOM.

[18]  M. O. Rabin PROBABILISTIC ALGORITHM IN FINITE FIELDS , 1979 .

[19]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[20]  A. Broder Some applications of Rabin’s fingerprinting method , 1993 .

[21]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[22]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[23]  Guy M. Lohman,et al.  R* optimizer validation and performance evaluation for local queries , 1986, SIGMOD '86.