A Compact In-memory Index for Managing Set Membership Queries on Streaming Data

Membership query of dynamic sets is essential for applications which generate or process a continuous stream of data items. These applications often require to cache items dynamically and answer membership queries for duplicate detection on unbounded data streams. Three key challenges for the caching mechanism are the limited memory space, high precision requirement and different priority-levels related with items. In this paper, we propose a compact in-memory index, Bloom Filter Ring (BFR), which is more suitable for dynamic caching of items on unbounded data streams. We demonstrate the time complexity and precision of BFR in finite memory space, and theoretically prove that BFR has higher expectation of average capacity than Aging Bloom Filter, the current state of art. Furthermore, we propose Priority-aware BFR (PBFR) to support membership query scheme which takes into account priority levels of items. Experimental results show that our algorithms gain better performance in term of cache hit ratio and false negative rate.

[1]  Jie Wu,et al.  Theory and Network Applications of Dynamic Bloom Filters , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[2]  Li Fan,et al.  Summary cache: a scalable wide-area Web cache sharing protocol , 1998, SIGCOMM '98.

[3]  Keqiu Li,et al.  Detection of Superpoints Using a Vector Bloom Filter , 2016, IEEE Transactions on Information Forensics and Security.

[4]  Junwei Jin,et al.  A Multipath Routing Protocol Based on Bloom Filter for Multihop Wireless Networks , 2016, Mob. Inf. Syst..

[5]  Purushottam Kulkarni,et al.  Importance-aware Bloom Filter for managing set membership queries on streaming data , 2013, 2013 Fifth International Conference on Communication Systems and Networks (COMSNETS).

[6]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[7]  MyungKeun Yoon,et al.  Aging Bloom Filter with Two Active Buffers for Dynamic Sets , 2010, IEEE Transactions on Knowledge and Data Engineering.

[8]  Fan Deng,et al.  Approximately detecting duplicates for streaming data using stable bloom filters , 2006, SIGMOD Conference.

[9]  C. Lynch Big data: How do your data grow? , 2008, Nature.

[10]  Jimmy J. Lin,et al.  Fast candidate generation for real-time tweet search with bloom filter chains , 2013, TOIS.

[11]  Huang-Shui Hu,et al.  L-priorities bloom filter: A new member of the bloom filter family , 2012, Int. J. Autom. Comput..

[12]  Kang G. Shin,et al.  The BLUE active queue management algorithms , 2002, TNET.

[13]  Gaogang Xie,et al.  A Shifting Bloom Filter Framework for Set Queries , 2015, Proc. VLDB Endow..

[14]  Yu Zhang,et al.  Improved Approximate Detection of Duplicates for Data Streams Over Sliding Windows , 2008, Journal of Computer Science and Technology.