Lightweight Monitoring of Distributed Streams

As data becomes dynamic, large, and distributed, there is increasing demand for what have become known as distributed stream algorithms. Since continuously collecting the data to a central server and processing it there is infeasible, a common approach is to define local conditions at the distributed nodes, such that—as long as they are maintained—some desirable global condition holds. Previous methods derived local conditions focusing on communication efficiency. While proving very useful for reducing the communication volume, these local conditions often suffer from heavy computational burden at the nodes. The computational complexity of the local conditions affects both the runtime and the energy consumption. These are especially critical for resource-limited devices like smartphones and sensor nodes. Such devices are becoming more ubiquitous due to the recent trend toward smart cities and the Internet of Things. To accommodate for high data rates and limited resources of these devices, it is crucial that the local conditions be quickly and efficiently evaluated. Here we propose a novel approach, designated CB (for Convex/Concave Bounds). CB defines local conditions using suitably chosen convex and concave functions. Lightweight and simple, these local conditions can be rapidly checked on the fly. CB’s superiority over the state-of-the-art is demonstrated in its reduced runtime and power consumption, by up to six orders of magnitude in some cases. As an added bonus, CB also reduced communication overhead in all the tested application scenarios.

[1]  Odysseas Papapetrou,et al.  Continuous fragmented skylines over distributed streams , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[2]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[3]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[4]  Dimitrios Gunopulos,et al.  Distributed deviation detection in sensor networks , 2003, SGMD.

[5]  J. Hellerstein,et al.  A Wakeup Call for Internet Monitoring Systems : The Case for Distributed Triggers , 2004 .

[6]  Assaf Schuster,et al.  Distributed Threshold Querying of General Functions by a Difference of Monotonic Representation , 2010, Proc. VLDB Endow..

[7]  Qin Zhang,et al.  Lower Bounds for Number-in-Hand Multiparty Communication Complexity, Made Easy , 2011, SIAM J. Comput..

[8]  Aggelos Bletsas,et al.  Geometric monitoring for CSI reduction in amplify-and-forward relay networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Pushpraj Shukla,et al.  Efficient Constraint Monitoring Using Adaptive Thresholds , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[10]  Mark Crovella,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM '04.

[11]  Graham Cormode,et al.  The continuous distributed monitoring model , 2013, SGMD.

[12]  Ran Wolff Distributed Convex Thresholding , 2015, PODC.

[13]  Assaf Schuster,et al.  Lightweight Monitoring of Distributed Streams , 2016, KDD.

[14]  Feifei Li,et al.  Efficient Threshold Monitoring for Distributed Probabilistic Data , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[15]  Ran Wolff,et al.  A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems , 2009, IEEE Transactions on Knowledge and Data Engineering.

[16]  Wei Hong,et al.  TinyDB: an acquisitional query processing system for sensor networks , 2005, TODS.

[17]  Gerhard Weikum,et al.  KLEE: A Framework for Distributed Top-k Query Algorithms , 2005, VLDB.

[18]  K. Fan On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations: II. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Joshua Brody,et al.  A Multi-Round Communication Lower Bound for Gap Hamming and Some Consequences , 2009, 2009 24th Annual IEEE Conference on Computational Complexity.

[20]  C. E. Gimba,et al.  Assessment of Gaseous Pollutants along High Traffic Roads in Kano, Nigeria , 2012 .

[21]  LiFan,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004 .

[22]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[23]  Gal Yehuda,et al.  Monitoring Properties of Large, Distributed, Dynamic Graphs , 2017, IPDPS.

[24]  Assaf Schuster,et al.  Communication-Efficient Distributed Variance Monitoring and Outlier Detection for Multivariate Time Series , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[25]  Assaf Schuster,et al.  Prediction-based geometric monitoring over distributed data streams , 2012, SIGMOD Conference.

[26]  Assaf Schuster,et al.  Aggregate Threshold Queries in Sensor Networks , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[27]  Assaf Schuster,et al.  Monitoring Least Squares Models of Distributed Streams , 2015, KDD.

[28]  Christos Faloutsos,et al.  Online data mining for co-evolving time sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[29]  Antonios Deligiannakis,et al.  Detecting Outliers in Sensor Networks Using the Geometric Approach , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[30]  Chrisil Arackaparambil,et al.  Functional Monitoring without Monotonicity , 2009, ICALP.

[31]  Abhinandan Das,et al.  Distributed Set Expression Cardinality Estimation , 2004, VLDB.

[32]  K. Fan On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations I. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Assaf Schuster,et al.  Privacy-Preserving Distributed Stream Monitoring , 2014, NDSS.

[34]  Amol Deshpande,et al.  Online Filtering, Smoothing and Probabilistic Modeling of Streaming data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[35]  M. Stone The Generalized Weierstrass Approximation Theorem , 1948 .

[36]  Luca Trevisan,et al.  Counting Distinct Elements in a Data Stream , 2002, RANDOM.

[37]  Themis Palpanas,et al.  Real-Time Data Analytics in Sensor Networks , 2013, Managing and Mining Sensor Data.

[38]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[39]  Ling Huang,et al.  In-Network PCA and Anomaly Detection , 2006, NIPS.

[40]  Graham Cormode,et al.  Communication-efficient distributed monitoring of thresholded counts , 2006, SIGMOD Conference.

[41]  Amir Abboud,et al.  Geometric Monitoring of Heterogeneous Streams , 2014, IEEE Transactions on Knowledge and Data Engineering.

[42]  Assaf Schuster,et al.  Privacy-Preserving Distributed Stream Monitoring (NDSS 2014) , 2014 .

[43]  Mukesh K. Mohania,et al.  Ratio threshold queries over distributed data sources , 2010, ICDE.

[44]  Feifei Li,et al.  Ranking distributed probabilistic data , 2009, SIGMOD Conference.

[45]  Daniel Keren,et al.  Sketch-based Geometric Monitoring of Distributed Stream Queries , 2013, Proc. VLDB Endow..

[46]  Danny Raz,et al.  Efficient reactive monitoring , 2002, IEEE J. Sel. Areas Commun..

[47]  Didier Henrion,et al.  GloptiPoly 3: moments, optimization and semidefinite programming , 2007, Optim. Methods Softw..

[48]  J. Yeh,et al.  Real Analysis: Theory of Measure and Integration , 2006 .

[49]  Rui Wang,et al.  Towards social user profiling: unified and discriminative influence model for inferring home locations , 2012, KDD.

[50]  Assaf Schuster,et al.  Anarchists, Unite: Practical Entropy Approximation for Distributed Streams , 2017, KDD.

[51]  Krithi Ramamritham,et al.  Handling Non-linear Polynomial Queries over Dynamic Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[52]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[53]  Assaf Schuster,et al.  A Geometric Approach to Monitoring Threshold Functions over Distributed Data Streams , 2010, Ubiquitous Knowledge Discovery.

[54]  Amir Ali Ahmadi,et al.  DC decomposition of nonconvex polynomials with algebraic techniques , 2015, Math. Program..

[55]  Kyung-Sup Kwak,et al.  The Internet of Things for Health Care: A Comprehensive Survey , 2015, IEEE Access.

[56]  Gene H. Golub,et al.  Matrix Computations, Third Edition , 1996 .

[57]  Marimuthu Palaniswami,et al.  An Information Framework for Creating a Smart City Through Internet of Things , 2014, IEEE Internet of Things Journal.

[58]  T. Banchoff,et al.  Differential Geometry of Curves and Surfaces , 2010 .

[59]  Graham Cormode,et al.  Sketching Streams Through the Net: Distributed Approximate Query Tracking , 2005, VLDB.

[60]  Assaf Schuster,et al.  Monitoring Distributed Streams using Convex Decompositions , 2015, Proc. VLDB Endow..

[61]  Ling Huang,et al.  Communication-Efficient Online Detection of Network-Wide Anomalies , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[62]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[63]  Emanuele Della Valle,et al.  BOTTARI: An augmented reality mobile application to deliver personalized and location-based recommendations by continuous analysis of social media streams , 2012, J. Web Semant..

[64]  Angel Domingo Sappa,et al.  Implicit Polynomial Representation Through a Fast Fitting Error Estimation , 2012, IEEE Transactions on Image Processing.

[65]  Ilya Molchanov,et al.  Distance transforms for real-valued functions , 2003 .

[66]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.

[67]  Assaf Schuster,et al.  Shape Sensitive Geometric Monitoring , 2012, IEEE Trans. Knowl. Data Eng..

[68]  Assaf Schuster,et al.  One for All and All for One: Simultaneous Approximation of Multiple Functions over Distributed Streams , 2017, DEBS.

[69]  Graham Cormode,et al.  Approximate continuous querying over distributed streams , 2008, TODS.

[70]  Assaf Schuster,et al.  Distributed Geometric Query Monitoring Using Prediction Models , 2014, ACM Trans. Database Syst..