Mining massive moving object datasets from rfid flow analysis to traffic mining

Effective management of moving object data, originating in supply chain operations, road network monitoring, and other RFID applications, is a major challenge facing society today, with important implications into business optimization, city planning, privacy, and national security. Towards the solution of this problem, I have developed a comprehensive framework for warehousing, mining, and cleaning large moving object data sets. At the core of my dissertation, is the RFID data warehousing engine. It receives clean data from the cleaning engine, and provides highly compressed data, at multiple levels of abstraction, to the mining engine. The mining engine is composed of three modules. The first, mines commodity flow patterns that identify general flow trends and significant flow exceptions in a large supply chain operation. The second, makes route recommendations, based on observed driving behavior and traffic conditions. And the third, discovers and characterizes a wide variety of traffic anomalies on a road network. RFID data warehousing. A data warehouse is an enterprise level data repository that collects and integrates organizational data in order to provide decision support analysis. At the core of the data warehouse is the data cube, which computes an aggregate measure (e.g., sum, avg, count) for all possible combination of dimensions of a fact table (e.g., sales for 2004, in the northeast). Online analytical processing (OLAP) operations provide the means for exploration and analysis of the data cube. My research on this direction has extended the data cube to handle moving object data sets [42], by significantly compressing such data, and proposing a new aggregation mechanism that preserves its path structure. RFID Data Cleaning. We propose a cleaning framework that takes an RFID data set and a collection of cleaning methods, with associated costs, and induces a cleaning plan that optimizes the overall accuracy-adjusted cleaning costs. The cleaning plan determines the conditions under which inexpensive cleaning methods can be safely applied, the conditions under which more expensive methods are absolutely necessary, and those cases when a combination of several methods is the optimal policy. Mining flow trends. An important application of moving objects is mining movement patterns of objects in supply chain operations. Creating a complete workflow that records all possible commodity movements and that incorporates time will be prohibitively expensive since there can be billions of different location and time combinations. I propose the FlowGraph [41], as a compressed probabilistic workflow, that captures the general flow trends and significant exceptions of a data set. The FlowGraph achieves compression by recording the set of major flow trends, and the set of non-redundant flow exceptions (i.e., abnormal transitions or durations) present in the data. Mining route recommendations. Most existing route planning applications use a fastest path algorithm based on static or dynamic models of road speeds, but such models in general disregard observed driver behavior, and other important factors such as weather, car-pool availability, or vehicle type. We propose a traffic-mining-based path-finding method [43] that mines speed and driving models from historic traffic data, and uses them to compute fast routes that are well supported by historic driving behavior under the set of relevant driving and traffic conditions. Mining traffic anomalies. Identification and characterization of traffic anomalies on massive road networks is a vital component of traffic monitoring [44]. Anomaly identification can be used to reduce congestion, increase safety, and provide transportation engineers with better information for traffic forecasting and road network design. However, due to the size, complexity and dynamics of such transportation networks, it is challenging to automate the process. We propose a multi-dimensional mining framework that can be used to identify a concise set of anomalies from massive traffic monitoring data, and further overlay, contrast, and explore such anomalies in multi-dimensional space.

[1]  M. Parent,et al.  Rule based prediction of fastest paths on urban networks , 2005, Proceedings. 2005 IEEE Intelligent Transportation Systems, 2005..

[2]  Panos Kalnis,et al.  Efficient OLAP Operations in Spatial Data Warehouses , 2001, SSTD.

[3]  Eamonn J. Keogh,et al.  An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback , 1998, KDD.

[4]  Jiawei Han,et al.  Mining compressed commodity workflows from massive RFID data sets , 2006, CIKM '06.

[5]  Samuel C Tignor,et al.  FREEWAY INCIDENT-DETECTION ALGORITHMS BASED ON DECISION TREES WITH STATES , 1978 .

[6]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[7]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[8]  Hans-Peter Kriegel,et al.  Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[9]  Laurence R. Rilett,et al.  Heuristic shortest path algorithms for transportation applications: State of the art , 2006, Comput. Oper. Res..

[10]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[11]  Tapio Elomaa,et al.  General and Efficient Multisplitting of Numerical Attributes , 1999, Machine Learning.

[12]  Valerie King,et al.  Fully dynamic algorithms for maintaining all-pairs shortest paths and transitive closure in digraphs , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[13]  Minos N. Garofalakis,et al.  Adaptive cleaning for RFID data streams , 2006, VLDB.

[14]  Padhraic Smyth,et al.  A general probabilistic framework for clustering individuals and objects , 2000, KDD '00.

[15]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[16]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[17]  Jiawei Han,et al.  Mining top-k frequent closed patterns without minimum support , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[18]  Jiawei Han,et al.  Adaptive Fastest Path Computation on a Road Network: A Traffic Mining Approach , 2007, VLDB.

[19]  JOHANNES GEHRKE,et al.  RainForest—A Framework for Fast Decision Tree Construction of Large Datasets , 1998, Data Mining and Knowledge Discovery.

[20]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[21]  Daniele Frigioni,et al.  Experimental analysis of dynamic algorithms for the single source shortest paths problem , 1998, JEAL.

[22]  James Brusey,et al.  Reasoning about uncertainty in location identification with auto-ID , 2003 .

[23]  Jeffrey F. Naughton,et al.  An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[24]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[25]  Christian Floerkemeier,et al.  Issues with RFID Usage in Ubiquitous Computing Applications , 2004, Pervasive.

[26]  Jiawei Han,et al.  Selective Materialization: An Efficient Method for Spatial Data Cube Construction , 1998, PAKDD.

[27]  Jiawei Han,et al.  Cost-Conscious Cleaning of Massive RFID Data Sets , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[28]  Colin de la Higuera,et al.  Probabilistic DFA Inference using Kullback-Leibler Divergence and Minimality , 2000, ICML.

[29]  Karl Petty,et al.  INCIDENTS ON THE FREEWAY : DETECTION AND MANAGEMENT , 1997 .

[30]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[31]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[32]  Giuseppe F. Italiano,et al.  Experimental analysis of dynamic all pairs shortest path algorithms , 2004, SODA '04.

[33]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[34]  Alexander Skabardonis,et al.  Detecting Errors and Imputing Missing Data for Single-Loop Surveillance Systems , 2003 .

[35]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[36]  A E Pisarski,et al.  NATIONAL TRANSPORTATION STATISTICS , 2000 .

[37]  Jian Pei,et al.  Mining frequent patterns by pattern-growth: methodology and implications , 2000, SKDD.

[38]  Arnold P. Boedihardjo,et al.  AOID: adaptive on-line incident detection system , 2006, 2006 IEEE Intelligent Transportation Systems Conference.

[39]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[40]  Peter Sanders,et al.  Highway Hierarchies Star , 2006, The Shortest Path Problem.

[41]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[42]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[43]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[44]  Yang Du,et al.  Finding Fastest Paths on A Road Network with Speed Patterns , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[45]  Yan Huang,et al.  Discovering Spatial Co-location Patterns: A Summary of Results , 2001, SSTD.

[46]  Peter Sanders,et al.  Highway Hierarchies Hasten Exact Shortest Path Queries , 2005, ESA.

[47]  Jiawei Han,et al.  Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration , 2003, Very Large Data Bases Conference.

[48]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[49]  Dipti Srinivasan,et al.  Support vector machine models for freeway incident detection , 2003, Proceedings of the 2003 IEEE International Conference on Intelligent Transportation Systems.

[50]  Sakti Pramanik,et al.  HiTi graph model of topographical road maps in navigation systems , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[51]  E.C.-P. Chang,et al.  Fuzzy systems based automatic freeway incident detection , 1994, Proceedings of IEEE International Conference on Systems, Man and Cybernetics.

[52]  Eric Horvitz,et al.  Prediction, Expectation, and Surprise: Methods, Designs, and Study of a Deployed Traffic Forecasting Service , 2005, UAI.

[53]  J A Martin,et al.  AUTOMATIC INCIDENT DETECTION - TRRL ALGORITHMS HIOCC AND PATREG , 1979 .

[54]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[55]  José Oncina,et al.  Learning Stochastic Regular Grammars by Means of a State Merging Method , 1994, ICGI.

[56]  Jiawei Han,et al.  Object-Based Selective Materialization for Efficient Implementation of Spatial Data Cubes , 2000, IEEE Trans. Knowl. Data Eng..

[57]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[58]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[59]  Adolf D May,et al.  FREEWAY DETECTOR DATA ANALYSIS : SMART CORRIDOR SIMULATION EVALUATION , 1993 .

[60]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[61]  P S Parsonson,et al.  TRAFFIC DETECTOR HANDBOOK , 1985 .

[62]  Padhraic Smyth,et al.  Trajectory clustering with mixtures of regression models , 1999, KDD '99.

[63]  Robert L. Smith,et al.  Approximating Shortest Paths in Large-Scale Networks with an Application to Intelligent Transportation Systems , 1998, INFORMS J. Comput..

[64]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[65]  Bing Jiang,et al.  I Sense a Disturbance in the Force: Unobtrusive Detection of Interactions with RFID-tagged Objects , 2004, UbiComp.

[66]  Wil M. P. van der Aalst,et al.  Process mining: a research agenda , 2004, Comput. Ind..

[67]  Jiawei Han,et al.  Discovery of Spatial Association Rules in Geographic Information Databases , 1995, SSD.

[68]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[69]  Dipti Srinivasan,et al.  DEVELOPMENT AND ADAPTATION OF CONSTRUCTIVE PROBABILISTIC NEURAL NETWORK IN FREEWAY INCIDENT DETECTION , 2002 .

[70]  Gustavo Alonso,et al.  A Pipelined Framework for Online Cleaning of Sensor Data Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[71]  Dimitrios Gunopulos,et al.  Mining Process Models from Workflow Logs , 1998, EDBT.

[72]  S. Pallottino,et al.  Shortest Path Algorithms in Transportation models: classical and innovative aspects , 1997 .

[73]  Jiawei Han,et al.  Flowcube: constructing RFID flowcubes for multi-dimensional analysis of commodity flows , 2006, VLDB.

[74]  Jeffrey F. Naughton,et al.  Materialized View Selection for Multidimensional Datasets , 1998, VLDB.

[75]  Angshuman Guin,et al.  An Incident Detection Algorithm Based On a Discrete State Propagation Model of Traffic Flow , 2004 .

[76]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[77]  Lester A Hoel,et al.  Traffic and Highway Engineering THIRD EDITION , 2002 .

[78]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..