Safe-Zones for Monitoring Distributed Streams

In many emerging applications, the data which has to be monitored is of very high volume, dynamic, and distributed, making it infeasible to collect the distinct data streams to a central node and process them there. Often, the monitoring problem consists of determining whether the value of a global function, which depends on the union of all streams, crossed a certain threshold. A great deal of effort is directed at reducing communication overhead by transforming the monitoring of the global function to the testing of local constraints, checked independently at the nodes. Recently, geometric monitoring (GM) proved to be very useful for constructing such local constraints for general (non-linear, non-monotonic) functions. Alas, in all current variants of geometric monitoring, the constraints at all nodes share an identical structure and are, thus, unsuitable for handling heterogeneous streams, which obey different distributions at the distinct nodes. To remedy this, we propose a general approach for geometric monitoring of heterogeneous streams (HGM), which defines constraints tailored to fit the distinct data distributions at the nodes. While optimally selecting the constraints is an NP-hard problem, we provide a practical solution, which seeks to reduce running time by hierarchically clustering nodes with similar data distributions and then solving more, but simpler, optimization problems. Experiments are provided to support the validity of the proposed approach.

[1]  Graham Cormode Algorithms for Continuous Distributing Monitoring: A survey. , 2011 .

[2]  David B. Cooper,et al.  Describing Complicated Objects by Implicit Polynomials , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Jacob Kogan,et al.  Feature Selection over Distributed Data Streams through Convex Optimization , 2012, SDM.

[4]  Krithi Ramamritham,et al.  Handling Non-linear Polynomial Queries over Dynamic Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[5]  Assaf Schuster,et al.  A Geometric Approach to Monitoring Threshold Functions over Distributed Data Streams , 2010, Ubiquitous Knowledge Discovery.

[6]  Antonios Deligiannakis,et al.  Detecting Outliers in Sensor Networks Using the Geometric Approach , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[7]  Michael Elad,et al.  Content Based Retrieval of VRML Objects - An Iterative and Interactive Approach , 2001, Eurographics Multimedia Workshop.

[8]  Odysseas Papapetrou,et al.  Sketch-based Querying of Distributed Sliding-Window Data Streams , 2012, Proc. VLDB Endow..

[9]  Feifei Li,et al.  Efficient Threshold Monitoring for Distributed Probabilistic Data , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[10]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[11]  Assaf Schuster,et al.  Shape Sensitive Geometric Monitoring , 2008, IEEE Transactions on Knowledge and Data Engineering.

[12]  Jacob Kogan Feature Selection Over Distributed Data Streams , 2014 .

[13]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[14]  Assaf Schuster,et al.  Prediction-based geometric monitoring over distributed data streams , 2012, SIGMOD Conference.

[15]  Jean Serra,et al.  Image Analysis and Mathematical Morphology , 1983 .

[16]  Daniel Keren,et al.  Sketch-based Geometric Monitoring of Distributed Stream Queries , 2013, Proc. VLDB Endow..

[17]  Amol Deshpande,et al.  Online Filtering, Smoothing and Probabilistic Modeling of Streaming data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[18]  Assaf Schuster,et al.  Distributed Threshold Querying of General Functions by a Difference of Monotonic Representation , 2010, Proc. VLDB Endow..

[19]  A. Goldstein,et al.  Gas‐phase chemistry dominates O3 loss to a forest, implying a source of aerosols and hydroxyl radicals to the atmosphere , 2003 .

[20]  Mukesh K. Mohania,et al.  Ratio threshold queries over distributed data sources , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[21]  Dan Halperin,et al.  Exact and efficient construction of Minkowski sums of convex polyhedra with applications , 2006, Comput. Aided Des..