Flowcube: constructing RFID flowcubes for multi-dimensional analysis of commodity flows

With the advent of RFID (Radio Frequency Identication) technology, manufacturers, distributors, and retailers will be able to track the movement of individual objects throughout the supply chain. The volume of data generated by a typical RFID application will be enormous as each item will generate a complete history of all the individual locations that it occupied at every point in time, possibly from a specific production line at a given factory, passing through multiple warehouses, and all the way to a particular checkout counter in a store. The movement trails of such RFID data form gigantic commodity flowgraph representing the locations and durations of the path stages traversed by each item. This commodity flow contains rich multi-dimensional information on the characteristics, trends, changes and outliers of commodity movements.In this paper, we propose a method to construct a warehouse of commodity flows, called flowcube. As in standard OLAP, the model will be composed of cuboids that aggregate item flows at a given abstraction level. The flowcube differs from the traditional data cube in two major ways. First, the measure of each cell will not be a scalar aggregate but a commodity flowgraph that captures the major movement trends and significant deviations of the items aggregated in the cell. Second, each flowgraph itself can be viewed at multiple levels by changing the level of abstraction of path stages. In this paper, we motivate the importance of the model, and present an efficient method to compute it by (1) performing simultaneous aggregation of paths to all interesting abstraction levels, (2) pruning low support path segments along the item and path stage abstraction lattices, and (3) compressing the cube by removing rarely occurring cells, and cells whose commodity flows can be inferred from higher level cells.

[1]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[2]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[3]  Jeffrey F. Naughton,et al.  Materialized View Selection for Multidimensional Datasets , 1998, VLDB.

[4]  Jiawei Han,et al.  Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration , 2003, Very Large Data Bases Conference.

[5]  RamakrishnanRaghu,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999 .

[6]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[7]  José Oncina,et al.  Learning Stochastic Regular Grammars by Means of a State Merging Method , 1994, ICGI.

[8]  Dimitrios Gunopulos,et al.  Mining Process Models from Workflow Logs , 1998, EDBT.

[9]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[10]  Diego Klabjan,et al.  Warehousing and Analyzing Massive RFID Data Sets , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[12]  Colin de la Higuera,et al.  Probabilistic DFA Inference using Kullback-Leibler Divergence and Minimality , 2000, ICML.

[13]  Simon Fraser MULTI-DIMENSIONAL SEQUENTIAL PATTERN MINING , 2001 .

[14]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[15]  Jiawei Han,et al.  Selective Materialization: An Efficient Method for Spatial Data Cube Construction , 1998, PAKDD.

[16]  Jian Pei,et al.  Efficient computation of Iceberg cubes with complex measures , 2001, SIGMOD '01.

[17]  Sudarshan S. Chawathe,et al.  Managing RFID Data , 2004, VLDB.

[18]  Wil M. P. van der Aalst,et al.  Process mining: a research agenda , 2004, Comput. Ind..

[19]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.