A deferred cleansing method for RFID data analytics

Radio Frequency Identification is gaining broader adoption in many areas. One of the challenges in implementing an RFID-based system is dealing with anomalies in RFID reads. A small number of anomalies can translate into large errors in analytical results. Conventional "eager" approaches cleanse all data upfront and then apply queries on cleaned data. However, this approach is not feasible when several applications define anomalies and corrections on the same data set differently and not all anomalies can be defined beforehand. This necessitates anomaly handling at query time. We introduce a deferred approach for detecting and correcting RFID data anomalies. Each application specifies the detection and the correction of relevant anomalies using declarative sequence-based rules. An application query is then automatically rewritten based on the cleansing rules that the application has specified, to provide answers over cleaned data. We show that a naive approach to deferred cleansing that applies rules without leveraging query information can be prohibitive. We develop two novel rewrite methods, both of which reduce the amount of data to be cleaned, by exploiting predicates in application queries while guaranteeing correct answers. We leverage standardized SQL/OLAP functionality to implement rules specified in a declarative sequence-based language. This allows efficient evaluation of cleansing rules using existing query processing capabilities of a DBMS. Our experimental results show that deferred cleansing is affordable for typical analytic queries over RFID data.

[1]  Ying Hu,et al.  Supporting RFID-based Item Tracking Applications in Oracle DBMS Using a Bitmap Datatype , 2005, VLDB.

[2]  Sudarshan S. Chawathe,et al.  Managing RFID Data , 2004, VLDB.

[3]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[4]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[5]  Joseph M. Hellerstein,et al.  Optimization techniques for queries with expensive methods , 1998, TODS.

[6]  Rajeev Motwani,et al.  Robust and efficient fuzzy match for online data cleaning , 2003, SIGMOD '03.

[7]  Kenneth A. Ross,et al.  Adapting materialized views after redefinitions , 1995, SIGMOD '95.

[8]  Surajit Chaudhuri,et al.  Data cleaning in microsoft SQL server 2005 , 2005, SIGMOD '05.

[9]  Diego Klabjan,et al.  Warehousing and Analyzing Massive RFID Data Sets , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[10]  Jan Chomicki,et al.  Scalar Aggregation in FD-Inconsistent Databases , 2001, ICDT.

[11]  Fusheng Wang,et al.  Temporal Management of RFID Data , 2005, VLDB.

[12]  Tao Lin,et al.  Integrating Automatic Data Acquisition with Business Processes - Experiences with SAP's Auto-ID Infrastructure , 2004, VLDB.

[13]  Andrea Calì,et al.  On the decidability and complexity of query answering over inconsistent and incomplete databases , 2003, PODS.

[14]  Carlo Zaniolo,et al.  Optimization of sequence queries in database systems , 2001, PODS '01.

[15]  Gustavo Alonso,et al.  Declarative Support for Sensor Data Cleaning , 2006, Pervasive.

[16]  Xin He,et al.  Scalar aggregation in inconsistent databases , 2003, Theor. Comput. Sci..

[17]  Renée J. Miller,et al.  ConQuer: efficient management of inconsistent databases , 2005, SIGMOD '05.