Bloom filter variants for multiple sets: a comparative assessment

In this paper we compare two probabilistic data structures for association queries derived from the well-known Bloom filter: the shifting Bloom filter (ShBF), and the spatial Bloom filter (SBF). With respect to the original data structure, both variants add the ability to store multiple subsets in the same filter, using different strategies. We analyse the performance of the two data structures with respect to false positive probability, and the inter-set error probability (the probability for an element in the set of being recognised as belonging to the wrong subset). As part of our analysis, we extended the functionality of the shifting Bloom filter, optimising the filter for any non-trivial number of subsets. We propose a new generalised ShBF definition with applications outside of our specific domain, and present new probability formulas. Results of the comparison show that the ShBF provides better space efficiency, but at a significantly higher computational cost than the SBF.

[1]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[2]  Tian He,et al.  kBF: A Bloom Filter for key-value storage with an application on approximate state machines , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[3]  Deke Guo,et al.  Optimizing Bloom Filter: Challenges, Solutions, and Comparisons , 2018, IEEE Communications Surveys & Tutorials.

[4]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[5]  Paolo Palmieri,et al.  Location privacy without mutual trust: The spatial Bloom filter , 2015, Comput. Commun..

[6]  Kang Li,et al.  Approximate caches for packet classification , 2004, IEEE INFOCOM 2004.

[7]  Paolo Palmieri,et al.  Private inter-network routing for Wireless Sensor Networks and the Internet of Things , 2017, Conf. Computing Frontiers.

[8]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[9]  Kumar Chellapilla,et al.  Bloomier Filters: A second look , 2008, ESA.

[10]  David E. Taylor,et al.  Longest prefix matching using bloom filters , 2006, TNET.

[11]  Paolo Palmieri,et al.  Spatial Bloom Filters: Enabling Privacy in Location-Aware Applications , 2014, Inscrypt.

[12]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[13]  Sasu Tarkoma,et al.  Theory and Practice of Bloom Filters for Distributed Systems , 2012, IEEE Communications Surveys & Tutorials.

[14]  Paolo Palmieri,et al.  Probabilistic Properties of the Spatial Bloom Filters and Their Relevance to Cryptographic Protocols , 2018, IEEE Transactions on Information Forensics and Security.

[15]  Bernard Chazelle,et al.  The Bloomier filter: an efficient data structure for static support lookup tables , 2004, SODA '04.

[16]  Fang Hao,et al.  Fast Multiset Membership Testing Using Combinatorial Bloom Filters , 2009, INFOCOM.

[17]  Fang Hao,et al.  IPv6 Lookups using Distributed and Load Balanced Bloom Filters for 100Gbps Core Router Line Cards , 2009, IEEE INFOCOM 2009.

[18]  Tong Yang,et al.  Difference Bloom Filter: A probabilistic structure for multi-set membership query , 2017, 2017 IEEE International Conference on Communications (ICC).

[19]  Hongjun Lu,et al.  Bloom Histogram: Path Selectivity Estimation for XML Data with Updates , 2004, VLDB.

[20]  Gaogang Xie,et al.  A Shifting Bloom Filter Framework for Set Queries , 2015, Proc. VLDB Endow..

[21]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[22]  Fabio Grandi On the analysis of Bloom filters , 2018, Inf. Process. Lett..

[23]  Mahmood Ahmadi,et al.  Bloom filter applications in network security: A state-of-the-art survey , 2013, Comput. Networks.