An Approximate Duplicate-Elimination in RFID Data Streams Based on d-Left Time Bloom Filter

There are a larger number of duplicates in RFID data streams, due to the multiple readings of an RFID tag by one RFID reader or by some RFID readers deployed to the same region in an RFID based system. Existing duplicate-elimination methods based on Time Bloom filter (TBF) require multiple counters to store the detected time of an element in RFID data streams, thus waste valuable memory resources. In this paper, we devise d-left Time Bloom filter (DLTBF) as an extension of d-left Counting Bloom filter. With the d-left hashing, a balanced allocation mechanism, DLTBF can store the detected time of an element into one counter. Then we propose an one-pass approximate method to remove duplicates in RFID data streams based on DLTBF. In an RFID data stream, suppose that the detected time of an element is T-bit, i.e., T bits are required to store the detected time of an element in RFID data streams, the number of non-duplicate elements within a time length of τ is W and the probability that a non-duplicate element is taken to be a duplicate by our method is e (the false positive probability), then the number of bits used by our method is O(Wlog2 (1/e) + WT). Experimental results on the synthetic data verify the effectiveness of our method.

[1]  Andrei Z. Broder,et al.  Using multiple hash functions to improve IP lookups , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[2]  Yannis Kotidis,et al.  RFID Data Aggregation , 2009, GSN.

[3]  George Varghese,et al.  An Improved Construction for Counting Bloom Filters , 2006, ESA.

[4]  Joan García-Haro,et al.  Tracking of Returnable Packaging and Transport Units with active RFID in the grocery supply chain , 2009, Comput. Ind..

[5]  Yossi Azar,et al.  Algorithms - ESA 2006, 14th Annual European Symposium, Zurich, Switzerland, September 11-13, 2006, Proceedings , 2006, ESA.

[6]  Haixun Wang,et al.  A Bayesian Inference-Based Framework for RFID Data Cleansing , 2013, IEEE Transactions on Knowledge and Data Engineering.

[7]  Ankur Narang,et al.  Streaming Quotient Filter: A Near Optimal Approximate Duplicate Detection Approach for Data Streams , 2013, Proc. VLDB Endow..

[8]  Chun-Hee Lee,et al.  An approximate duplicate elimination in RFID data streams , 2011, Data Knowl. Eng..

[9]  Chinya V. Ravishankar,et al.  Inferential time-decaying Bloom filters , 2013, EDBT '13.

[10]  Filippo Furfaro,et al.  Cleaning trajectory data of RFID-monitored objects through conditioning under integrity constraints , 2014, EDBT.

[11]  Berthold Vöcking,et al.  How asymmetry helps load balancing , 1999, JACM.

[12]  Charu C. Aggarwal,et al.  A Survey of RFID Data Processing , 2013, Managing and Mining Sensor Data.

[13]  Minos N. Garofalakis,et al.  Adaptive cleaning for RFID data streams , 2006, VLDB.

[14]  Fusheng Wang,et al.  Efficiently Filtering RFID Data Streams , 2006, CleanDB.

[15]  Haixun Wang,et al.  Leveraging spatio-temporal redundancy for RFID data cleansing , 2010, SIGMOD Conference.

[16]  Wen Jiang,et al.  A Two-Layer Duplicate Filtering Approach for RFID Data Streams , 2012 .