Warehousing and Mining Massive RFID Data Sets

Radio Frequency Identification (RFID) applications are set to play an essential role in object tracking and supply chain management systems. In the near future, it is expected that every major retailer will use RFID systems to track the movement of products from suppliers to warehouses, store backrooms and eventually to points of sale. The volume of information generated by such systems can be enormous as each individual item (a pallet, a case, or an SKU) will leave a trail of data as it moves through different locations. We propose two data models for the management of this data. The first is a path cube that preserves object transition information while allowing muti-dimensional analysis of path dependent aggregates. The second is a workflow cube that summarizes the major patterns and significant exceptions in the flow of items through the system. The design of our models is based on the following observations: (1) items usually move together in large groups through early stages in the system (e.g., distribution centers) and only in later stages (e.g., stores) do they move in smaller groups, (2) although RFID data is registered at the primitive level, data analysis usually takes place at a higher abstraction level, (3) many items have similar flow patterns and only a relatively small number of them truly deviate from the general trend, and (4) only non-redundant flow deviations with respect to previously recorded deviations are interesting. These observations facilitate the construction of highly compressed RFID data warehouses and the exploration of such data warehouses by scalable data mining. In this study we give a general overview of the principles driving the design of our framework. We believe warehousing and mining RFID data presents an interesting application for advanced data mining.

[1]  Diego Klabjan,et al.  Warehousing and Analyzing Massive RFID Data Sets , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  Colin de la Higuera,et al.  Probabilistic DFA Inference using Kullback-Leibler Divergence and Minimality , 2000, ICML.

[3]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[4]  Dimitrios Gunopulos,et al.  Mining Process Models from Workflow Logs , 1998, EDBT.

[5]  Jiawei Han,et al.  Flowcube: constructing RFID flowcubes for multi-dimensional analysis of commodity flows , 2006, VLDB.

[6]  Jeffrey F. Naughton,et al.  Materialized View Selection for Multidimensional Datasets , 1998, VLDB.

[7]  Jiawei Han,et al.  Cost-Conscious Cleaning of Massive RFID Data Sets , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[8]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[9]  Minos N. Garofalakis,et al.  Adaptive cleaning for RFID data streams , 2006, VLDB.

[10]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[11]  Wil M. P. van der Aalst,et al.  Workflow mining: discovering process models from event logs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[12]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[13]  José Oncina,et al.  Learning Stochastic Regular Grammars by Means of a State Merging Method , 1994, ICGI.

[14]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[15]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[16]  Gustavo Alonso,et al.  A Pipelined Framework for Online Cleaning of Sensor Data Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[17]  Jiawei Han,et al.  Mining compressed commodity workflows from massive RFID data sets , 2006, CIKM '06.