Sliding Bloom Filters

A Bloom filter is a method for reducing the space (memory) required for representing a set by allowing a small error probability. In this paper we consider a Sliding Bloom Filter: a data structure that, given a stream of elements, supports membership queries of the set of the last n elements (a sliding window), while allowing a small error probability and a slackness parameter. The problem of sliding Bloom filters has appeared in the literature in several communities, but this work is the first theoretical investigation of it.

[1]  Martin Dietzfelbinger,et al.  Succinct Data Structures for Retrieval and Approximate Membership , 2008, ICALP.

[2]  Yong Guan,et al.  Detecting Click Fraud in Pay-Per-Click Streams of Online Advertising Networks , 2008, 2008 The 28th International Conference on Distributed Computing Systems.

[3]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[4]  Shachar Lovett,et al.  A Lower Bound for Dynamic Approximate Membership Data Structures , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[5]  Kang Li,et al.  Approximate caches for packet classification , 2004, IEEE INFOCOM 2004.

[6]  MyungKeun Yoon,et al.  Aging Bloom Filter with Two Active Buffers for Dynamic Sets , 2010, IEEE Transactions on Knowledge and Data Engineering.

[7]  Fan Deng,et al.  Approximately detecting duplicates for streaming data using stable bloom filters , 2006, SIGMOD Conference.

[8]  Moni Naor,et al.  Backyard Cuckoo Hashing: Constant Worst-Case Operations with a Succinct Representation , 2009, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[9]  Sasu Tarkoma,et al.  Theory and Practice of Bloom Filters for Distributed Systems , 2012, IEEE Communications Surveys & Tutorials.

[10]  Larry Carter,et al.  Exact and approximate membership testers , 1978, STOC.

[11]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[12]  Mikkel Thorup Timeouts with time-reversed linear probing , 2011, 2011 Proceedings IEEE INFOCOM.

[13]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[14]  S. Srinivasa Rao,et al.  An optimal Bloom filter replacement , 2005, SODA '05.

[15]  Divyakant Agrawal,et al.  Duplicate detection in click streams , 2005, WWW '05.

[16]  Rasmus Pagh,et al.  How to Approximate a Set without Knowing Its Size in Advance , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[17]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[18]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..