TinySet - An Access Efficient Self Adjusting Bloom Filter Construction

Bloom filters are a very popular and efficient data structure for approximate set membership queries. However, Bloom filters have several key limitations as they require 44% more space than the lower bound, their operations access multiple memory words and they do not support removals. This work presents TinySet, an alternative Bloom filter construction that is more space efficient than Bloom filters for false positive rates smaller than 2.8%, accesses only a single memory word and partially supports removals. TinySet is mathematically analyzed and extensively tested and is shown to be fast and more space efficient than a variety of Bloom filter variants. TinySet also has low sensitivity to configuration parameters and is therefore more flexible than a Bloom filter.

[1]  Wei Li,et al.  A Multi-partitioning Approach to Building Fast and Accurate Counting Bloom Filters , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[2]  Christof Fetzer,et al.  Bloom filter based routing for content-based publish/subscribe , 2008, DEBS.

[3]  Mahmood Ahmadi,et al.  Bloom filter applications in network security: A state-of-the-art survey , 2013, Comput. Networks.

[4]  Roy Friedman,et al.  Shades: Expediting Kademlia's lookup process , 2014, Comput. Networks.

[5]  Michael T. Goodrich,et al.  Invertible bloom lookup tables , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[6]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[7]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[8]  Julong Lan,et al.  A variable length counting Bloom filter , 2010, 2010 2nd International Conference on Computer Engineering and Technology.

[9]  Sarang Dharmapurikar,et al.  Longest prefix matching using bloom filters , 2006, IEEE/ACM Transactions on Networking.

[10]  H. Jonathan Chao,et al.  Aggregated Bloom Filters for Intrusion Detection and Prevention Hardware , 2007, IEEE GLOBECOM 2007 - IEEE Global Telecommunications Conference.

[11]  George Varghese,et al.  Beyond bloom filters: from approximate membership checks to approximate state machines , 2006, SIGCOMM.

[12]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[13]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[14]  Pekka Nikander,et al.  LIPSIN: line speed publish/subscribe inter-networking , 2009, SIGCOMM '09.

[15]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[16]  S. Srinivasa Rao,et al.  An optimal Bloom filter replacement , 2005, SODA '05.

[17]  Larry Carter,et al.  Exact and approximate membership testers , 1978, STOC.

[18]  Fang Hao,et al.  IPv6 Lookups using Distributed and Load Balanced Bloom Filters for 100Gbps Core Router Line Cards , 2009, IEEE INFOCOM 2009.

[19]  Shigang Chen,et al.  One memory access bloom filters and their generalization , 2011, 2011 Proceedings IEEE INFOCOM.

[20]  Stefano Giordano,et al.  Enhancing Counting Bloom Filters Through Huffman-Coded Multilayer Structures , 2010, IEEE/ACM Transactions on Networking.

[21]  Isaac Keslassy,et al.  The Bloom Paradox: When Not to Use a Bloom Filter , 2015, IEEE/ACM Transactions on Networking.

[22]  Roy Friedman,et al.  TinyLFU: A Highly Efficient Cache Admission Policy , 2014, PDP.

[23]  Gil Einziger,et al.  Independent counter estimation buckets , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[24]  Hyoung-Joo Kim,et al.  Join processing using Bloom filter in MapReduce , 2012, RACS.

[25]  Guy M. Lohman,et al.  Optimizer Validation and Performance Evaluation for Distributed Queries , 1998 .

[26]  George Varghese,et al.  An Improved Construction for Counting Bloom Filters , 2006, ESA.

[27]  Steven S. Lumetta,et al.  Using the Power of Two Choices to Improve Bloom Filters , 2007, Internet Math..

[28]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[29]  Roy Friedman,et al.  Postman: An Elastic Highly Resilient Publish/Subscribe Framework for Self Sustained Service Independent P2P Networks , 2014, SSS.

[30]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[31]  Roy Friedman,et al.  Counting with TinyTable: Every bit counts! , 2015, 2015 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[32]  Isaac Keslassy,et al.  Access-efficient Balanced Bloom Filters , 2012, 2012 IEEE International Conference on Communications (ICC).

[33]  Michael Mitzenmacher,et al.  Compressed bloom filters , 2001, PODC '01.

[34]  Nan Hua,et al.  Rank-indexed hashing: A compact construction of Bloom filters and variants , 2008, 2008 IEEE International Conference on Network Protocols.

[35]  Peter Sanders,et al.  Cache-, hash-, and space-efficient bloom filters , 2009, JEAL.

[36]  Isaac Keslassy,et al.  The Variable-Increment Counting Bloom Filter , 2012, IEEE/ACM Transactions on Networking.