Counting Bloom Filter is an efficient multi-hash algorithm based on Bloom Filter. It uses a space-efficient randomized data structure to represent a set with certain allowable errors, and allows membership and multiplicity queries over the set. Aiming at the set whose items frequencies following heavy-tailed distribution, this paper presents a novel algorithm called Multi-Granularities Counting Bloom Filter (MGCBF) based on Counting Bloom Filter. This algorithm applies hierarchical data structures through several counting bloom filters to store the items frequencies information in the set. The time and space complexities analysis of this algorithm illustrates that it can reduce the space needed dramatically with the cost of little additional compute-time. And the following experiments indicate this algorithm is more efficient than other algorithms with same errors probability when the items frequencies of the target set follow heavy-tailed distribution.
[1]
Li Fan,et al.
Summary cache: a scalable wide-area web cache sharing protocol
,
2000,
TNET.
[2]
Anees Shaikh,et al.
Load-sensitive routing of long-lived IP flows
,
1999,
SIGCOMM '99.
[3]
Scott Shenker,et al.
On the characteristics and origins of internet flow rates
,
2002,
SIGCOMM.
[4]
Burton H. Bloom,et al.
Space/time trade-offs in hash coding with allowable errors
,
1970,
CACM.
[5]
Abhishek Kumar,et al.
Space-code bloom filter for efficient per-flow traffic measurement
,
2004,
IEEE INFOCOM 2004.
[6]
Andrei Broder,et al.
Network Applications of Bloom Filters: A Survey
,
2004,
Internet Math..
[7]
J. Steindl.
The Pareto Distribution
,
1990
.
[8]
Yossi Matias,et al.
Spectral bloom filters
,
2003,
SIGMOD '03.