Leveraging spatio-temporal redundancy for RFID data cleansing

Radio Frequency Identification (RFID) technologies are used in many applications for data collection. However, raw RFID readings are usually of low quality and may contain many anomalies. An ideal solution for RFID data cleansing should address the following issues. First, in many applications, duplicate readings (by multiple readers simultaneously or by a single reader over a period of time) of the same object are very common. The solution should take advantage of the resulting data redundancy for data cleaning. Second, prior knowledge about the readers and the environment (e.g., prior data distribution, false negative rates of readers) may help improve data quality and remove data anomalies, and a desired solution must be able to quantify the degree of uncertainty based on such knowledge. Third, the solution should take advantage of given constraints in target applications (e.g., the number of objects in a same location cannot exceed a given value) to elevate the accuracy of data cleansing. There are a number of existing RFID data cleansing techniques. However, none of them support all the aforementioned features. In this paper we propose a Bayesian inference based approach for cleaning RFID raw data. Our approach takes full advantage of data redundancy. To capture the likelihood, we design an n-state detection model and formally prove that the 3-state model can maximize the system performance. Moreover, in order to sample from the posterior, we devise a Metropolis-Hastings sampler with Constraints (MH-C), which incorporates constraint management to clean RFID raw data with high efficiency and accuracy. We validate our solution with a common RFID application and demonstrate the advantages of our approach through extensive simulations.

[1]  Sunil Prabhakar,et al.  U-DBMS: A Database System for Managing Constantly-Evolving Data , 2005, VLDB.

[2]  Daniel W. Engels,et al.  Colorwave: an anticollision algorithm for the reader collision problem , 2003, IEEE International Conference on Communications, 2003. ICC '03..

[3]  Dan Suciu,et al.  Probabilistic Event Extraction from RFID Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[4]  Peter J. Haas,et al.  MCDB: a monte carlo approach to managing uncertain data , 2008, SIGMOD Conference.

[5]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[6]  Frederick Reiss,et al.  Design Considerations for High Fan-In Systems: The HiFi Approach , 2005, CIDR.

[7]  Dan Suciu,et al.  Towards correcting input data errors probabilistically using integrity constraints , 2006, MobiDE '06.

[8]  Jaideep Srivastava,et al.  Tag-Splitting: Adaptive Collision Arbitration Protocols for RFID Tag Identification , 2007, IEEE Transactions on Parallel and Distributed Systems.

[9]  Minos N. Garofalakis,et al.  An adaptive RFID middleware for supporting metaphysical data independence , 2008, The VLDB Journal.

[10]  Daniel W. Engels,et al.  The reader collision problem , 2002, IEEE International Conference on Systems, Man and Cybernetics.

[11]  Fusheng Wang,et al.  Temporal Management of RFID Data , 2005, VLDB.

[12]  Samuel Madden,et al.  Using Probabilistic Models for Data Management in Acquisitional Environments , 2005, CIDR.

[13]  Christian Floerkemeier,et al.  Issues with RFID Usage in Ubiquitous Computing Applications , 2004, Pervasive.

[14]  Prashant J. Shenoy,et al.  Probabilistic Inference over RFID Streams in Mobile Environments , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[15]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[16]  Dan Olteanu,et al.  Query language support for incomplete information in the MayBMS system , 2007, VLDB.

[17]  Minos N. Garofalakis,et al.  Adaptive cleaning for RFID data streams , 2006, VLDB.

[18]  Daniel W. Engels,et al.  HiQ: a hierarchical Q-learning algorithm to solve the reader collision problem , 2006, International Symposium on Applications and the Internet Workshops (SAINTW'06).

[19]  Diego Klabjan,et al.  Warehousing and Analyzing Massive RFID Data Sets , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[20]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[21]  Renée J. Miller,et al.  Clean Answers over Dirty Databases: A Probabilistic Approach , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[22]  Philip S. Yu,et al.  A Sampling-Based Approach to Information Recovery , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[23]  Gustavo Alonso,et al.  Declarative Support for Sensor Data Cleaning , 2006, Pervasive.

[24]  Sudarshan S. Chawathe,et al.  Managing RFID Data , 2004, VLDB.

[25]  Jun Rao,et al.  A deferred cleansing method for RFID data analytics , 2006, VLDB.

[26]  Roy Want,et al.  The Magic of RFID , 2004, ACM Queue.

[27]  Sheldon M. Ross Introduction to Probability Models. , 1995 .

[28]  Sheldon M. Ross,et al.  Introduction to Probability Models, Eighth Edition , 1972 .