Hash Adaptive Bloom Filter

Bloom filter is a compact memory-efficient probabilistic data structure supporting membership testing, i.e., to check whether an element is in a given set. However, as Bloom filter maps each element with uniformly random hash functions, few flexibilities are provided even if the information of negative keys (elements are not in the set) are available. The problem gets worse when the misidentification of negative keys brings different costs. To address the above problems, we propose a new Hash Adaptive Bloom Filter (HABF) that supports the customization of hash functions for keys. The key idea of HABF is to customize the hash functions for positive keys (elements are in the set) to avoid negative keys with high cost, and pack customized hash functions into a lightweight data structure named HashExpressor. Then, given an element at query time, HABF follows a two-round pattern to check whether the element is in the set. Further, we theoretically analyze the performance of HABF and bound the expected false positive rate. We conduct extensive experiments on representative datasets, and the results show that HABF outperforms the standard Bloom filter and its cutting-edge variants on the whole in terms of accuracy, construction time, query time, and memory space consumption (Note that source codes are available in [1]).

[1]  Raghu Ramakrishnan,et al.  bLSM: a general purpose log structured merge tree , 2012, SIGMOD Conference.

[2]  Ming Zhong,et al.  Optimizing data popularity conscious bloom filters , 2008, PODC '08.

[3]  David Hung-Chang Du,et al.  AC-Key: Adaptive Caching for LSM-based Key-Value Stores , 2020, USENIX Annual Technical Conference.

[4]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[5]  Michael Mitzenmacher,et al.  Compressed bloom filters , 2001, PODC '01.

[6]  Amitabha Bagchi,et al.  Adaptive Learned Bloom Filters under Incremental Workloads , 2020, COMAD/CODS.

[7]  Fan Guo,et al.  ElasticBF: Elastic Bloom Filter with Hotness Awareness for Boosting Read Performance in Large Key-Value Stores , 2019, USENIX Annual Technical Conference.

[8]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[9]  Guy M. Lohman,et al.  Optimizer Validation and Performance Evaluation for Distributed Queries , 1998 .

[10]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Sergey Bartunov,et al.  Meta-Learning Neural Bloom Filters , 2019, ICML.

[12]  Jie Gao,et al.  Weighted Bloom filter , 2006, 2006 IEEE International Symposium on Information Theory.

[13]  Panagiotis Manolios,et al.  Adaptive approximate state storage , 2010 .

[14]  Michael Mitzenmacher,et al.  A Model for Learned Bloom Filters and Optimizing by Sandwiching , 2018, NeurIPS.

[15]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[16]  Daniel Lemire,et al.  Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters , 2019 .

[17]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[18]  Jie Wu,et al.  The Dynamic Bloom Filters , 2010, IEEE Transactions on Knowledge and Data Engineering.

[19]  Yossi Matias,et al.  Spectral bloom filters , 2003, SIGMOD '03.

[20]  Fan Deng,et al.  Approximately detecting duplicates for streaming data using stable bloom filters , 2006, SIGMOD Conference.

[21]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[22]  Fang Hao,et al.  Building high accuracy bloom filters using partitioned hashing , 2007, SIGMETRICS '07.

[23]  Tim Kraska,et al.  The Case for Learned Index Structures , 2018 .

[24]  Zhenwei Dai,et al.  Adaptive Learned Bloom Filter (Ada-BF): Efficient Utilization of the Classifier , 2019, NeurIPS.

[25]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[26]  Wei Chen,et al.  A novel approach to detecting DDoS Attacks at an Early Stage , 2006, The Journal of Supercomputing.

[27]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[28]  Michael Mitzenmacher,et al.  Less hashing, same performance: Building a better Bloom filter , 2006, Random Struct. Algorithms.

[29]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[30]  Ali A. Ghorbani,et al.  A Performance Evaluation of Hash Functions for IP Reputation Lookup Using Bloom Filters , 2015, 2015 10th International Conference on Availability, Reliability and Security.

[31]  David M. W. Powers,et al.  Applications and Explanations of Zipf’s Law , 1998, CoNLL.