On the Choice of General Purpose Classifiers in Learned Bloom Filters: An Initial Analysis Within Basic Filters

Bloom Filters are a fundamental and pervasive data structure. Within the growing area of Learned Data Structures, several Learned versions of Bloom Filters have been considered, yielding advantages over classic Filters. Each of them uses a classifier, which is the Learned part of the data structure. Although it has a central role in those new filters, and its space footprint as well as classification time may affect the performance of the Learned Filter, no systematic study of which specific classifier to use in which circumstances is available. We report progress in this area here, providing also initial guidelines on which classifier to choose among five classic classification paradigms.

[1]  D. Cox The Regression Analysis of Binary Sequences , 1958 .

[2]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[3]  Tim Kraska,et al.  Partitioned Learned Bloom Filters , 2021, ICLR.

[4]  Dario Malchiodi,et al.  Reproducing the Sparse Huffman Address Map Compression for Deep Neural Networks , 2021, RRPR.

[5]  Fabrizio Lillo,et al.  On the performance of learned data structures , 2021, Theor. Comput. Sci..

[6]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[7]  David A. Freedman,et al.  Statistical Models: Theory and Practice: References , 2005 .

[8]  Tim Kraska,et al.  RadixSpline: a single-pass learned index , 2020, aiDM@SIGMOD.

[9]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[10]  Dario Malchiodi,et al.  Compression strategies and space-conscious representations for deep neural networks , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  Tim Kraska,et al.  Benchmarking learned indexes , 2020, VLDB 2020.

[13]  Tim Kraska,et al.  CDFShop: Exploring and Optimizing Learned Index Structures , 2020, SIGMOD Conference.

[14]  Andreas Zell,et al.  Simulation neuronaler Netze , 1994 .

[15]  Jens Dittrich,et al.  A Critical Analysis of Recursive Model Indexes , 2021, Proc. VLDB Endow..

[16]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[17]  Xin Long,et al.  A Survey of Related Research on Compression and Acceleration of Deep Neural Networks , 2019 .

[18]  Tim Kraska,et al.  The Case for Learned Index Structures , 2018 .

[19]  Sergei Vassilvitskii,et al.  Algorithms with predictions , 2020, Beyond the Worst-Case Analysis of Algorithms.

[20]  Raffaele Giancarlo,et al.  Learned Sorted Table Search and Static Indexes in Small Space: Methodological and Practical Insights via an Experimental Study , 2021, ArXiv.

[21]  Michael Mitzenmacher,et al.  A Model for Learned Bloom Filters and Optimizing by Sandwiching , 2018, NeurIPS.

[22]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[23]  Zhenwei Dai,et al.  Adaptive Learned Bloom Filter (Ada-BF): Efficient Utilization of the Classifier , 2019, NeurIPS.

[24]  Paolo Ferragina,et al.  A "Learned" Approach to Quicken and Compress Rank/Select Dictionaries , 2021, ALENEX.