The Bloom Paradox: When Not to Use a Bloom Filter

In this paper, we uncover the Bloom paradox in Bloom Filters: Sometimes, the Bloom Filter is harmful and should not be queried. We first analyze conditions under which the Bloom paradox occurs in a Bloom Filter and demonstrate that it depends on the a priori probability that a given element belongs to the represented set. We show that the Bloom paradox also applies to Counting Bloom Filters (CBFs) and depends on the product of the hashed counters of each element. In addition, we further suggest improved architectures that deal with the Bloom paradox in Bloom Filters, CBFs, and their variants. We further present an application of the presented theory in cache sharing among Web proxies. Lastly, using simulations, we verify our theoretical results and show that our improved schemes can lead to a large improvement in the performance of Bloom Filters and CBFs.

[1]  Ely Porat,et al.  An Optimal Bloom Filter Replacement Based on Matrix Solving , 2008, CSR.

[2]  Peter Sanders,et al.  Cache-, hash-, and space-efficient bloom filters , 2009, JEAL.

[3]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[4]  Gilles Brassard,et al.  Two Observations on Probabilistic Primality Testing , 1986, CRYPTO.

[5]  Isaac Keslassy,et al.  Access-efficient Balanced Bloom Filters , 2012, 2012 IEEE International Conference on Communications (ICC).

[6]  Xiaoxia Wu,et al.  Design exploration of hybrid caches with disparate memory technologies , 2010, TACO.

[7]  Michael Mitzenmacher,et al.  Compressed bloom filters , 2001, PODC '01.

[8]  Isaac Keslassy,et al.  The Bloom paradox: When not to use a Bloom filter? , 2012, INFOCOM.

[9]  Erik Hagersten,et al.  StatCache: a probabilistic approach to efficient and accurate data locality analysis , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[10]  Isaac Keslassy,et al.  The Variable-Increment Counting Bloom Filter , 2012, IEEE/ACM Transactions on Networking.

[11]  Andrea Montanari,et al.  Counter braids: a novel counter architecture for per-flow measurement , 2008, SIGMETRICS '08.

[12]  Chen Ding,et al.  Locality approximation using time , 2007, POPL '07.

[13]  Chen Ding,et al.  Miss Rate Prediction Across Program Inputs and Cache Configurations , 2007, IEEE Transactions on Computers.

[14]  Sharad Malik,et al.  Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.

[15]  Rasmus Pagh,et al.  Lossy Dictionaries , 2001, ESA.

[16]  Michael Mitzenmacher,et al.  Less hashing, same performance: Building a better Bloom filter , 2006, Random Struct. Algorithms.

[17]  H BloomBurton Space/time trade-offs in hash coding with allowable errors , 1970 .

[18]  Julong Lan,et al.  A variable length counting Bloom filter , 2010, 2010 2nd International Conference on Computer Engineering and Technology.

[19]  Fang Hao,et al.  IPv6 Lookups using Distributed and Load Balanced Bloom Filters for 100Gbps Core Router Line Cards , 2009, IEEE INFOCOM 2009.

[20]  John W. Lockwood,et al.  Fast and Scalable Pattern Matching for Network Intrusion Detection Systems , 2006, IEEE Journal on Selected Areas in Communications.

[21]  Nick McKeown,et al.  Designing Packet Buffers for Router Linecards , 2008, IEEE/ACM Transactions on Networking.

[22]  Otto Carlos Muniz Bandeira Duarte,et al.  A Generalized Bloom Filter to Secure Distributed Network Applications , 2011, Comput. Networks.

[23]  H. Vacher Computational Geology 25: Quantitative Literacy - Drug Testing, Cancer Screening, and the Identification of Igneous Rocks , 2003 .

[24]  George Varghese,et al.  Beyond bloom filters: from approximate membership checks to approximate state machines , 2006, SIGCOMM.

[25]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[26]  Stefano Giordano,et al.  Enhancing Counting Bloom Filters Through Huffman-Coded Multilayer Structures , 2010, IEEE/ACM Transactions on Networking.

[27]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[28]  Pekka Nikander,et al.  Self-Routing Denial-of-Service Resistant Capabilities Using In-packet Bloom Filters , 2009, 2009 European Conference on Computer Network Defense.

[29]  Minlan Yu,et al.  BUFFALO: bloom filter forwarding architecture for large organizations , 2009, CoNEXT '09.

[30]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[31]  Shachar Lovett,et al.  A Lower Bound for Dynamic Approximate Membership Data Structures , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[32]  M. Weitzman Optimal search for the best alternative , 1978 .

[33]  William C. Thompsont,et al.  The Prosecutor's Fallacy and the Defense Attorney's Fallacy* , 1987 .

[34]  David A. Padua,et al.  Estimating cache misses and locality using stack distances , 2003, ICS '03.

[35]  Larry Carter,et al.  Exact and approximate membership testers , 1978, STOC.

[36]  Sarang Dharmapurikar,et al.  Longest prefix matching using bloom filters , 2006, IEEE/ACM Transactions on Networking.

[37]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.

[38]  Bruno Baynat,et al.  Retouched bloom filters: allowing networked applications to trade off selected false positives against false negatives , 2006, CoNEXT '06.

[39]  George Varghese,et al.  An Improved Construction for Counting Bloom Filters , 2006, ESA.

[40]  Tuomas Aura,et al.  Denial-of-Service Attacks in Bloom-Filter-Based Forwarding , 2014, IEEE/ACM Transactions on Networking.