Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters

The Bloom filter provides fast approximate set membership while using little memory. Engineers often use these filters to avoid slow operations such as disk or network accesses. As an alternative, a cuckoo filter may need less space than a Bloom filter and it is faster. Chazelle et al. proposed a generalization of the Bloom filter called the Bloomier filter. Dietzfelbinger and Pagh described a variation on the Bloomier filter that can be used effectively for approximate membership queries. It has never been tested empirically, to our knowledge. We review an efficient implementation of their approach, which we call the xor filter. We find that xor filters can be faster than Bloom and cuckoo filters while using less memory. We further show that a more compact version of xor filters (xor+) can use even less space than highly compact alternatives (e.g., Golomb-compressed sequences) while providing speeds competitive with Bloom filters.

[1]  Raghu Ramakrishnan,et al.  bLSM: a general purpose log structured merge tree , 2012, SIGMOD Conference.

[2]  Martin Dietzfelbinger,et al.  Succinct Data Structures for Retrieval and Approximate Membership , 2008, ICALP.

[3]  Dong Zhou,et al.  Space-Efficient, High-Performance Rank and Select Structures on Uncompressed Bit Sequences , 2013, SEA.

[4]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[5]  Pei Cao,et al.  Hash-AV: fast virus signature scanning by cache-resident filters , 2005, GLOBECOM '05. IEEE Global Telecommunications Conference, 2005..

[6]  Peter Sanders,et al.  Cache-, hash-, and space-efficient bloom filters , 2009, JEAL.

[7]  Sebastiano Vigna,et al.  Broadword Implementation of Rank/Select Queries , 2008, WEA.

[8]  Norman May,et al.  Interleaving with Coroutines: A Practical Approach for Robust Index Joins , 2017, Proc. VLDB Endow..

[9]  Jim Hunter,et al.  Exploiting Coroutines to Attack the "Killer Nanoseconds" , 2018, Proc. VLDB Endow..

[10]  Michael A. Bender,et al.  A General-Purpose Counting Filter: Making Every Bit Count , 2017, SIGMOD Conference.

[11]  Kenneth A. Ross,et al.  Vectorized Bloom filters for advanced SIMD processors , 2014, DaMoN '14.

[12]  Michael Mitzenmacher,et al.  Compressed bloom filters , 2002, TNET.

[13]  David A. Patterson,et al.  Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server , 2015, 2015 IEEE International Symposium on Workload Characterization.

[14]  Rasmus Pagh,et al.  Simple and Space-Efficient Minimal Perfect Hash Functions , 2007, WADS.

[15]  Xiaozhou Li,et al.  Algorithmic improvements for fast concurrent Cuckoo hashing , 2014, EuroSys '14.

[16]  Mario Zagar,et al.  Adapting the Bloom filter to multithreaded environments , 2010, Melecon 2010 - 2010 15th IEEE Mediterranean Electrotechnical Conference.

[17]  John W. Lockwood,et al.  Deep packet inspection using parallel Bloom filters , 2003, 11th Symposium on High Performance Interconnects, 2003. Proceedings..

[18]  Ely Porat,et al.  An Optimal Bloom Filter Replacement Based on Matrix Solving , 2008, CSR.

[19]  Dean M. Tullsen,et al.  Horton Tables: Fast Hash Tables for In-Memory Data-Intensive Computing , 2016, USENIX Annual Technical Conference.

[20]  Kumar Chellapilla,et al.  Bloomier Filters: A second look , 2008, ESA.

[21]  Daniel Lemire,et al.  Fast Random Integer Generation in an Interval , 2018, ACM Trans. Model. Comput. Simul..

[22]  Bin Fan,et al.  Cuckoo Filter: Practically Better Than Bloom , 2014, CoNEXT.

[23]  Bernard Chazelle,et al.  The Bloomier filter: an efficient data structure for static support lookup tables , 2004, SODA '04.

[24]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[25]  Michael Molloy Cores in random hypergraphs and Boolean formulas , 2005 .

[26]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[27]  Isaac Keslassy,et al.  The Variable-Increment Counting Bloom Filter , 2012, IEEE/ACM Transactions on Networking.

[28]  Daniel Lemire,et al.  Regular and almost universal hashing: an efficient implementation , 2016, Softw. Pract. Exp..

[29]  Fan Deng,et al.  Approximately detecting duplicates for streaming data using stable bloom filters , 2006, SIGMOD Conference.

[30]  Michael J. Smith,et al.  XOR-Satisfiability Set Membership Filters , 2018, SAT.

[31]  George Varghese,et al.  An Improved Construction for Counting Bloom Filters , 2006, ESA.

[32]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[33]  Nuwan Jayasena,et al.  Morton Filters: Faster, Space-Efficient Cuckoo Filters via Biasing, Compression, and Decoupled Logical Sparsity , 2018, Proc. VLDB Endow..

[34]  Lieven Eeckhout,et al.  Boosting the Priority of Garbage , 2016, ACM Trans. Archit. Code Optim..

[35]  Alfons Kemper,et al.  Performance-Optimal Filtering: Bloom overtakes Cuckoo at High-Throughput , 2019, Proc. VLDB Endow..

[36]  Maya Gokhale,et al.  Language classification using n-grams accelerated by FPGA-based Bloom filters , 2007, HPRCTA.