Online temporal-spatial analysis for detection of critical events in Cyber-Physical Systems

Cyber-Physical Systems (CPS) employ sensors to observe physical environments and to detect events of interest. Equipped with sensing, computing, and communication capabilities, Cyber-Physical Systems aim to make physical-systems smart(er). For example, smart electricity meters nowadays measure and report power consumption as well as critical events such as power outages. However, each day, such sensors report a variety of warnings and errors: many merely indicate transient faults or short instabilities of the physical system (environment). Thus, given the big volumes of data, the time-efficient processing of these events, especially in large-scale scenarios with hundreds of thousands of sensors, is a key challenge in CPSs. Motivated by the fact that critical events of CPSs often have temporal-spatial properties, we focus on identifying critical events by an online temporal-spatial analysis on the data stream of messages. We explicitly model the online detection problem as a single-linkage clustering on a data stream over a sliding-window, where the inherent computational complexity of the detection problem is derived. Based on this model, we propose a grid-based single-linkage clustering algorithm over a sliding-window, which is an online time-space efficient method satisfying the quick processing demand of big data streams. We analyze the performance of the proposed approach by both a series of propositions and a large, real-world data-set of deployed CPS, composing 300,000 sensors, over one year. We show that the proposed method identifies above 95% of the critical events in the data-set and save the time-space requirement by 4 orders of magnitude compared with the conventional clustering method.

[1]  Elke Achtert,et al.  Online hierarchical clustering in a data warehouse environment , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[2]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[3]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[4]  Christopher D. Manning,et al.  Introduction to Information Retrieval: Hierarchical clustering , 2008 .

[5]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Data stream clustering: A survey , 2013, CSUR.

[6]  Yizhou Sun,et al.  Multidimensional Analysis of Atypical Events in Cyber-Physical Data , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[7]  Christian Sohler,et al.  StreamKM++: A clustering algorithm for data streams , 2010, JEAL.

[8]  Yanlei Diao,et al.  SASE: Complex Event Processing over Streams (Demo) , 2007, CIDR.

[9]  Dino Pedreschi,et al.  Time-focused clustering of trajectories of moving objects , 2006, Journal of Intelligent Information Systems.

[10]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[11]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[12]  DANA AVRAM LUPŞA,et al.  UNSUPERVISED SINGLE-LINK HIERARCHICAL CLUSTERING , 2005 .

[13]  Yang Li,et al.  Cascadia: A System for Specifying, Detecting, and Managing RFID Events , 2008, MobiSys '08.

[14]  Ge Yu,et al.  A Survey on Event Processing for CPS , 2012, CWSN.

[15]  Nicholas Jing Yuan,et al.  On discovery of gathering patterns from trajectories , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[16]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[17]  Jing Yuan,et al.  On Discovery of Traveling Companions from Streaming Trajectories , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[18]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[19]  Philip S. Yu,et al.  Density-based clustering of data streams at multiple resolutions , 2009, TKDD.

[20]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.