Graph dependency construction based on interval-event dependencies detection in data streams

Pattern mining over data streams is critical to a variety of applications such as understanding and predicting weather phenomena or outdoor surveillance. Most of the current techniques attempt to discover relationships between time-point events but are not practical for discovering dependencies over interval-based events. In this work, we present a new approach to mine dependencies between streams of interval-based events that links two events if they occur in a similar manner, one being often followed by the other one after a certain time interval in the data. The proposed method is robust to temporal variability of events and determines the most appropriate time intervals whose validity is assessed by a Chi2 test. As several intervals may redundantly describe the same dependency, the approach retrieves only the most speci c intervals with respect to a dominance relationship over temporal dependencies, and thus avoids the classical problem of pattern flooding in data mining. The TEDDY algorithm, TEmporal Dependency DiscoverY, prunes the search space while guaranteeing the discovery of all valid and signifi cant temporal dependencies. We present empirical results on simulated and real-life data to show the scalability and the robustness of our approach. The dependency relationships defi ne a graph that supports intelligent analysis as illustrated by two case studies: Outdoor surveillance of a building via video camera and motion sensors, and assistance for road deicing operations based on the humidity and temperature measurements at the urban scale. These applications demonstrate the eficiency and the e fectiveness of our approach.

[1]  Yen-Liang Chen,et al.  Mining Nonambiguous Temporal Patterns for Interval-Based Events , 2007, IEEE Transactions on Knowledge and Data Engineering.

[2]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[3]  Diane J. Cook,et al.  Mining Sensor Streams for Discovering Human Activity Patterns over Time , 2010, 2010 IEEE International Conference on Data Mining.

[4]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[5]  John F. Roddick,et al.  ARMADA - An algorithm for discovering richer relative temporal association rules from interval-based data , 2007, Data Knowl. Eng..

[6]  Myra Spiliopoulou,et al.  On exploiting the power of time in data mining , 2008, SKDD.

[7]  Céline Robardet,et al.  When TEDDY meets GrizzLY: temporal dependency discovery for triggering road deicing operations , 2013, KDD.

[8]  Yiyu Yao,et al.  Induction of Classification Rules by Granular Computing , 2002, Rough Sets and Current Trends in Computing.

[9]  Philip S. Yu,et al.  On dense pattern mining in graph streams , 2010, Proc. VLDB Endow..

[10]  Wang Ben-nian Frequent Pattern Mining in Data Streams , 2007 .

[11]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[12]  Dino Pedreschi,et al.  Efficient Mining of Temporally Annotated Sequences , 2006, SDM.

[13]  Diane J. Cook,et al.  Using Association Rule Mining to Discover Temporal Relations of Daily Activities , 2011, ICOST.

[14]  Yixin Chen,et al.  Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams , 2005, Distributed and Parallel Databases.

[15]  Johannes Gehrke,et al.  Cayuga: A General Purpose Event Monitoring System , 2007, CIDR.

[16]  E. Frind,et al.  Numerical Investigation of Road Salt Impact on an Urban Wellfield , 2006, Ground water.

[17]  Wang-Chien Lee,et al.  Mining Correlation Patterns among Appliances in Smart Home Environment , 2014, PAKDD.

[18]  Dmitriy Fradkin,et al.  Robust Mining of Time Intervals with Semi-interval Partial Order Patterns , 2010, SDM.

[19]  Elke A. Rundensteiner,et al.  Sequence Pattern Query Processing over Out-of-Order Event Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[20]  Elke A. Rundensteiner,et al.  Complex event pattern detection over streams with interval-based temporal semantics , 2011, DEBS '11.

[21]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[22]  Ruoming Jin,et al.  An algorithm for in-core frequent itemset mining on streaming data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[23]  Lei Chang,et al.  SeqStream: Mining Closed Sequential Patterns over Stream Sliding Windows , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[24]  Chris Jermaine,et al.  Finding the most interesting correlations in a database: how hard can it be? , 2005, Inf. Syst..

[25]  Ada Wai-Chee Fu,et al.  Discovering Temporal Patterns for Interval-Based Events , 2000, DaWaK.

[26]  Shinichi Morishita,et al.  Transversing itemset lattices with statistical metric pruning , 2000, PODS '00.

[27]  Elke A. Rundensteiner,et al.  Constraint-Aware Complex Event Pattern Detection over Streams , 2010, DASFAA.

[28]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[29]  Jiawei Han,et al.  Stream Sequential Pattern Mining with Precise Error Bounds , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[30]  Dimitrios Gunopulos,et al.  Hum-a-song: A Subsequence Matching with Gaps-Range-Tolerances Query-By-Humming System , 2012, Proc. VLDB Endow..

[31]  Mong-Li Lee,et al.  Mining relationships among interval-based events for classification , 2008, SIGMOD Conference.

[32]  Tamara G. Kolda,et al.  Mining large graphs and streams using matrix and tensor tools , 2007, SIGMOD '07.

[33]  Frank Klawonn,et al.  Finding informative rules in interval sequences , 2001, Intell. Data Anal..

[34]  Zdzislaw Pawlak,et al.  Rough Set Theory and its Applications to Data Analysis , 1998, Cybern. Syst..

[35]  Jun'ichi Tatemura,et al.  Runtime Semantic Query Optimization for Event Stream Processing , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[36]  Ugur Çetintemel,et al.  Plan-based complex event detection across distributed sources , 2008, Proc. VLDB Endow..

[37]  A. Akhmetova Discovery of Frequent Episodes in Event Sequences , 2006 .

[38]  Liang Tang,et al.  Discovering lag intervals for temporal dependencies , 2012, KDD.

[39]  Fosca Giannotti,et al.  Temporal mining for interactive workflow data analysis , 2009, KDD.

[40]  Chedy Raïssi,et al.  Mining Multidimensional Sequential Patterns over Data Streams , 2008, DaWaK.

[41]  Avishek Saha,et al.  Sequential Dependencies , 2009, Proc. VLDB Endow..

[42]  S. P. S. Arya,et al.  Introduction to micrometeorology , 1988 .

[43]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[44]  Lawrence B. Holder,et al.  Discovering Activities to Recognize and Track in a Smart Environment , 2011, IEEE Transactions on Knowledge and Data Engineering.

[45]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[46]  Dino Pedreschi,et al.  Unveiling the complexity of human mobility by querying and mining massive trajectory data , 2011, The VLDB Journal.

[47]  N. Eyles,et al.  Hydrogeological impacts of road salt from Canada's busiest highway on a Lake Ontario watershed (Frenchman's Bay) and lagoon, City of Pickering. , 2009, Journal of contaminant hydrology.

[48]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[49]  Xindong Wu,et al.  Sequential pattern mining in multiple streams , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).