Constructions and Applications for Accurate Counting of the Bloom Filter False Positive Free Zone

Bloom filters are used in many networking applications to answer set membership queries at low cost but suffer from false positives. We study Bloom filter constructions that when representing a set of size up to d taken from a finite universe of size n, completely avoid false positives. We suggest memory-efficient Bloom filters constructions with a false positive free zone to allow representations of larger sets through linear memory dependency in the set size. Our first construction relies on Orthogonal Latin Square (OLS) codes and the second relies on the representation of elements through values of polynomials defined modulo primes. Beyond Bloom filters supporting set membership, we also consider sketches allowing a more general functionality such as flow size estimation. In particular, we show the applicability of the false positive free zone for accurate size estimation in the famous Count-Min sketch. We compare the new constructions to existing approaches through analytical and experimental evaluations for showing their superiority.

[1]  Paola Grosso,et al.  Tracking Network Flows with P4 , 2018, 2018 IEEE/ACM Innovating the Network for Data-Intensive Science (INDIS).

[2]  David Eppstein,et al.  Improved Combinatorial Group Testing Algorithms for Real-World Problem Sizes , 2005, SIAM J. Comput..

[3]  Maurice Herlihy,et al.  Hopscotch Hashing , 2008, DISC.

[4]  Jeffrey Scott Vitter Implementations for coalesced hashing , 1982, CACM.

[5]  Yuan He,et al.  A Bloom filters based dissemination protocol in wireless sensor networks , 2013, Ad Hoc Networks.

[6]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[7]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[8]  Lajos Rónyai,et al.  Bloom Filter with a False Positive Free Zone , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[9]  Sasu Tarkoma,et al.  Theory and Practice of Bloom Filters for Distributed Systems , 2012, IEEE Communications Surveys & Tutorials.

[10]  Salvatore Pontarelli,et al.  A Method to Extend Orthogonal Latin Square Codes , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[11]  Minlan Yu,et al.  FlowRadar: A Better NetFlow for Data Centers , 2016, NSDI.

[12]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[13]  L. Litwin,et al.  Error control coding , 2001 .

[14]  J. Dénes,et al.  Latin squares and their applications , 1974 .

[15]  Jörg Ott,et al.  Forwarding anomalies in Bloom filter-based multicast , 2011, 2011 Proceedings IEEE INFOCOM.

[16]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[17]  Bin Fan,et al.  Cuckoo Filter: Practically Better Than Bloom , 2014, CoNEXT.

[18]  Deke Guo,et al.  Optimizing Bloom Filter: Challenges, Solutions, and Comparisons , 2018, IEEE Communications Surveys & Tutorials.

[19]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[20]  Peng Liu,et al.  Elastic sketch: adaptive and fast network-wide measurements , 2018, SIGCOMM.

[21]  Otto Carlos Muniz Bandeira Duarte,et al.  A Generalized Bloom Filter to Secure Distributed Network Applications , 2011, Comput. Networks.

[22]  Jie Wu,et al.  Efficient Multiset Synchronization , 2017, IEEE/ACM Transactions on Networking.

[23]  Vladimir Braverman,et al.  One Sketch to Rule Them All: Rethinking Network Flow Monitoring with UnivMon , 2016, SIGCOMM.

[24]  D. C. Bossen,et al.  Orthogonal latin square codes , 1970 .

[25]  Nick Feamster,et al.  Concise Encoding of Flow Attributes in SDN Switches , 2017, SOSR.

[26]  S. Muthukrishnan,et al.  Heavy-Hitter Detection Entirely in the Data Plane , 2016, SOSR.

[27]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[28]  Jaikumar Radhakrishnan,et al.  Data Structures for Storing Small Sets in the Bitprobe Model , 2010, ESA.

[29]  Benoit Donnet,et al.  Path similarity evaluation using Bloom filters , 2012, Comput. Networks.

[30]  Donald E. Knuth,et al.  The Art of Computer Programming, Vol. 3: Sorting and Searching , 1974 .

[31]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[32]  Fabio Pereira,et al.  Secure network monitoring using programmable data planes , 2017, 2017 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN).

[33]  Salvatore Pontarelli,et al.  Adaptive Cuckoo Filters , 2017, ALENEX.

[34]  Wei Wang,et al.  Noisy Bloom Filters for Multi-Set Membership Testing , 2016, SIGMETRICS.

[35]  Manos Athanassoulis,et al.  Monkey: Optimal Navigable Key-Value Store , 2017, SIGMOD Conference.

[36]  Sarang Dharmapurikar,et al.  Longest prefix matching using bloom filters , 2006, IEEE/ACM Transactions on Networking.

[37]  Sanjay Jha,et al.  Early detection of in-the-wild botnet attacks by exploiting network communication uniformity: An empirical study , 2017, 2017 IFIP Networking Conference (IFIP Networking) and Workshops.

[38]  Salvatore Pontarelli,et al.  Improving counting Bloom filter performance with fingerprints , 2016, Inf. Process. Lett..

[39]  Isaac Keslassy,et al.  The Variable-Increment Counting Bloom Filter , 2012, IEEE/ACM Transactions on Networking.

[40]  Bruno Baynat,et al.  Retouched bloom filters: allowing networked applications to trade off selected false positives against false negatives , 2006, CoNEXT '06.

[41]  George Varghese,et al.  An Improved Construction for Counting Bloom Filters , 2006, ESA.

[42]  Minlan Yu,et al.  A Comparison of Performance and Accuracy of Measurement Algorithms in Software , 2018, SOSR.