Adaptive Learned Bloom Filters under Incremental Workloads

The recently proposed paradigm of learned Bloom filters (LBF) seems to offer significant advantages over traditional Bloom filters in terms of low memory footprint and overall performance as evidenced by empirical evaluations over static data. Its behavior in presence of updates to the set of keys being stored in Bloom filters is not very well understood. At the same time, maintaining the false positive rates (FPR) of traditional Bloom filters in presence of dynamics has been studied and extensions to carefully expand memory footprint of the filters without sacrificing FPR have been proposed. Building on these, we propose two distinct approaches for handling data updates encountered in practical uses of LBF: (i) CA-LBF, where we adjust the learned model (e.g., by retraining) to accommodate the new "unseen" data, resulting in classifier adaptive methods, and (ii) IA-LBF, where we replace the traditional Bloom filter with its adaptive version while keeping the learned model unchanged, leading to an index adaptive method. In this paper, we explore these two approaches in detail under incremental workloads, evaluating them in terms of their adaptability, memory footprint and false positive rates. Our empirical results using a variety of datasets and learned models of varying complexity show that our proposed methods' ability to handle incremental updates is quite robust.

[1]  Scott Hauck,et al.  K-Mer Counting Using Bloom Filters with an FPGA-Attached HMC , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[2]  Jie Wu,et al.  The Dynamic Bloom Filters , 2010, IEEE Transactions on Knowledge and Data Engineering.

[3]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[4]  Amitabha Bagchi,et al.  Dynamic Partition Bloom Filters: A Bounded False Positive Solution For Dynamic Set Membership (Extended Abstract) , 2019, ArXiv.

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[7]  Srikanta J. Bedathur,et al.  Tracking the Conductance of Rapidly Evolving Topic-Subgraphs , 2015, Proc. VLDB Endow..

[8]  Tim Kraska,et al.  The Case for Learned Index Structures , 2018 .

[9]  David A. Forsyth,et al.  Shape, Contour and Grouping in Computer Vision , 1999, Lecture Notes in Computer Science.

[10]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[13]  Tao Qin,et al.  Introducing LETOR 4.0 Datasets , 2013, ArXiv.

[14]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[15]  Michael Mitzenmacher,et al.  A Model for Learned Bloom Filters and Related Structures , 2018, ArXiv.

[16]  Yoshua Bengio,et al.  Object Recognition with Gradient-Based Learning , 1999, Shape, Contour and Grouping in Computer Vision.