Vacuum Filters: More Space-Efficient and Faster Replacement for Bloom and Cuckoo Filters

We present vacuum filters, a type of data structures to support approximate membership queries. Vacuum filters cost the smallest space among all known AMQ data structures and provide higher insertion and lookup throughput in most situations. Hence they can be used as the replacement of the widely used Bloom filters and cuckoo filters. Similar to cuckoo filters, vacuum filters also store item fingerprints in a table. The memory-efficiency and throughput improvements are from the innovation of a table insertion and fingerprint eviction strategy that achieves both high load factor and data locality without any restriction of the table size. In addition, we propose a new update framework to resolve two difficult problems for AMQ structures under dynamics, namely duplicate insertions and set resizing. The experiments show that vacuum filters can achieve 25% less space in average and similar throughput compared to cuckoo filters, and 15% less space and >10x throughput compared to Bloom filters, with same false positive rates. AMQ data structures are widely used in various layers of computer systems and networks and are usually hosted in platforms where memory is limited and precious. Hence the improvements brought by vacuum filters can be considered significant. PVLDB Reference Format: Minmei Wang, Mingxun Zhou, Shouqian Shi, Chen Qian. Vacuum Filters: More Space-Efficient and Faster Replacement for Bloom and Cuckoo Filters. PVLDB, 13(2): 197-210, 2019. DOI: https://doi.org/10.14778/3364324.3364333

[1]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[2]  Minlan Yu,et al.  Software Defined Traffic Measurement with OpenSketch , 2013, NSDI.

[3]  Haoyu Song,et al.  Fast hash table lookup using extended bloom filter: an aid to network processing , 2005, SIGCOMM '05.

[4]  Minlan Yu,et al.  BUFFALO: bloom filter forwarding architecture for large organizations , 2009, CoNEXT '09.

[5]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[6]  Sarang Dharmapurikar,et al.  Longest prefix matching using bloom filters , 2006, IEEE/ACM Transactions on Networking.

[7]  Christian Esteve Rothenberg,et al.  The deletable Bloom filter: a new member of the Bloom family , 2010, IEEE Communications Letters.

[8]  Michael A. Bender,et al.  Don't Thrash: How to Cache Your Hash on Flash , 2011, Proc. VLDB Endow..

[9]  Qin Zhang,et al.  A concise forwarding information base for scalable and fast name lookups , 2017, 2017 IEEE 25th International Conference on Network Protocols (ICNP).

[10]  Srinivasan Seshan,et al.  Packet caches on routers: the implications of universal redundant traffic elimination , 2008, SIGCOMM '08.

[11]  Salvatore Pontarelli,et al.  A length-aware cuckoo filter for faster IP lookup , 2016, 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[12]  Salvatore Pontarelli,et al.  Adaptive Cuckoo Filters , 2017, ALENEX.

[13]  Colin Cooper,et al.  Randomization and Approximation Techniques in Computer Science , 1999, Lecture Notes in Computer Science.

[14]  Michael A. Bender,et al.  A General-Purpose Counting Filter: Making Every Bit Count , 2017, SIGMOD Conference.

[15]  David Eppstein,et al.  Cuckoo Filter: Simplification and Analysis , 2016, SWAT.

[16]  Peter Sanders,et al.  Cache-, hash-, and space-efficient bloom filters , 2009, JEAL.

[17]  Anastasia Ailamaki,et al.  BF-Tree: Approximate Tree Indexing , 2014, Proc. VLDB Endow..

[18]  James K. Mullin,et al.  Optimal Semijoins for Distributed Database Systems , 1990, IEEE Trans. Software Eng..

[19]  Xiaozhou Li,et al.  Algorithmic improvements for fast concurrent Cuckoo hashing , 2014, EuroSys '14.

[20]  Qin Zhang,et al.  Memory-Efficient and Ultra-Fast Network Lookup and Forwarding Using Othello Hashing , 2016, IEEE/ACM Transactions on Networking.

[21]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[22]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[23]  Ming Zhang,et al.  Duet: cloud scale load balancing with hardware and software , 2015, SIGCOMM.

[24]  Bin Fan,et al.  Cuckoo Filter: Practically Better Than Bloom , 2014, CoNEXT.

[25]  Xin Li,et al.  Distributed Collaborative Monitoring in Software Defined Networks , 2014, ArXiv.

[26]  Bruce M. Maggs,et al.  Algorithmic Nuggets in Content Delivery , 2015, CCRV.

[27]  Bruce M. Maggs,et al.  CRLite: A Scalable System for Pushing All TLS Revocations to All Browsers , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[28]  Alfons Kemper,et al.  Performance-Optimal Filtering: Bloom overtakes Cuckoo at High-Throughput , 2019, Proc. VLDB Endow..

[29]  Minlan Yu,et al.  SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs , 2017, SIGCOMM.

[30]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[31]  Thierry Turletti,et al.  A Survey of Software-Defined Networking: Past, Present, and Future of Programmable Networks , 2014, IEEE Communications Surveys & Tutorials.

[32]  Martin Raab,et al.  "Balls into Bins" - A Simple and Tight Analysis , 1998, RANDOM.

[33]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[34]  Yingyuan Xiao,et al.  D-Ary Cuckoo Filter: A Space Efficient Data Structure for Set Membership Lookup , 2017, 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS).

[35]  George Varghese,et al.  An Improved Construction for Counting Bloom Filters , 2006, ESA.

[36]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[37]  Nuwan Jayasena,et al.  Morton Filters: Faster, Space-Efficient Cuckoo Filters via Biasing, Compression, and Decoupled Logical Sparsity , 2018, Proc. VLDB Endow..

[38]  Ghassan O. Karame,et al.  On the privacy provisions of Bloom filters in lightweight bitcoin clients , 2014, IACR Cryptol. ePrint Arch..

[39]  Manos Athanassoulis,et al.  Optimal Bloom Filters and Adaptive Merging for LSM-Trees , 2018, ACM Trans. Database Syst..

[40]  Jie Wu,et al.  The dynamic cuckoo filter , 2017, 2017 IEEE 25th International Conference on Network Protocols (ICNP).

[41]  Dan Li,et al.  ESM: Efficient and Scalable Data Center Multicast Routing , 2012, IEEE/ACM Transactions on Networking.

[42]  Peter Sanders,et al.  Dynamic Space Efficient Hashing , 2017, ESA.