Mining compressed commodity workflows from massive RFID data sets

Radio Frequency Identification (RFID) technology is fast becoming a prevalent tool in tracking commodities in supply chain management applications. The movement of commodities through the supply chain forms a gigantic workflow that can be mined for the discovery of trends, flow correlations and outlier paths, that in turn can be valuable in understanding and optimizing business processes.In this paper, we propose a method to construct compressed probabilistic workflows that capture the movement trends and significant exceptions of the overall data sets, but with a size that is substantially smaller than that of the complete RFID workflow. Compression is achieved based on the following observations: (1) only a relatively small minority of items deviate from the general trend, (2)only truly non-redundant deviations, ie, those that substantially deviate from the previously recorded ones, are interesting, and (3) although RFID data is registered at the primitive level, data analysis usually takes place at a higher abstraction level. Techniques for workflow compression based on non-redundant transition and emission probabilities are derived; and an algorithm for computing approximate path probabilities is developed. Our experiments demonstrate the utility and feasibility of our design, data structure, and algorithms.

[1]  Jiawei Han,et al.  Mining top-k frequent closed patterns without minimum support , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[2]  Dimitrios Gunopulos,et al.  Mining Process Models from Workflow Logs , 1998, EDBT.

[3]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[4]  Boudewijn F. van Dongen,et al.  Discovering Workflow Performance Models from Timed Logs , 2002, EDCIS.

[5]  Wil M. P. van der Aalst,et al.  Process mining: a research agenda , 2004, Comput. Ind..

[6]  Jiawei Han,et al.  Flowcube: constructing RFID flowcubes for multi-dimensional analysis of commodity flows , 2006, VLDB.

[7]  Gustavo Alonso,et al.  A Pipelined Framework for Online Cleaning of Sensor Data Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[8]  Diego Klabjan,et al.  Warehousing and Analyzing Massive RFID Data Sets , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[9]  Colin de la Higuera,et al.  Probabilistic DFA Inference using Kullback-Leibler Divergence and Minimality , 2000, ICML.

[10]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[11]  Minos N. Garofalakis,et al.  Adaptive cleaning for RFID data streams , 2006, VLDB.

[12]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[13]  José Oncina,et al.  Learning Stochastic Regular Grammars by Means of a State Merging Method , 1994, ICGI.

[14]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .