Anarchists, Unite: Practical Entropy Approximation for Distributed Streams

Entropy is a fundamental property of data and a key metric in many scientific and engineering fields. Entropy estimation has been extensively studied, but almost always under the assumption that there is a single data stream, seen in its entirety by one node running the estimation algorithm. Multiple distributed data sources are becoming increasingly common, however, with applications in signal processing, computer science, medicine, physics, and more. Centralizing all data can be infeasible, for example in networks of battery or bandwidth limited sensors, so entropy estimation in distributed streams requires new, communication-efficient approaches. We propose a practical communication-efficient algorithm for continuously approximating the entropy of distributed streams, with deterministic, user-defined error bounds. Unlike previous streaming methods, it supports deletions and variable-sized time-based sliding windows, while still avoiding communication when possible. Moreover, it optionally incorporates a state-of-the-art entropy sketch, allowing for both bandwidth reduction and monitoring very high dimensional problems. Finally, it provides the approximation to all nodes, rather than to a centralized location, which is important in settings such as wireless sensor networks. Evaluation on several public datasets from real application domains shows that our adaptive algorithm can often reduce the number of messages by two orders of magnitude, compared to centralizing all data in one node.

[1]  J. Jakobsson,et al.  Entropy of EEG during anaesthetic induction: a comparative study with propofol or nitrous oxide as sole agent. , 2004, British journal of anaesthesia.

[2]  William Bialek,et al.  Entropy and information in neural spike trains: progress on the sampling problem. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Gianmarco De Francisci Morales,et al.  The power of both choices: Practical load balancing for distributed stream processing engines , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[4]  Assaf Schuster,et al.  Monitoring Least Squares Models of Distributed Streams , 2015, KDD.

[5]  David P. Woodruff,et al.  Tight bounds for distributed functional monitoring , 2011, STOC '12.

[6]  Assaf Schuster,et al.  Communication-Efficient Distributed Variance Monitoring and Outlier Detection for Multivariate Time Series , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[7]  Martin Connors,et al.  Optimization Models , 2014 .

[8]  Amir Abboud,et al.  Geometric Monitoring of Heterogeneous Streams , 2014, IEEE Transactions on Knowledge and Data Engineering.

[9]  Yannis Theodoridis,et al.  In-network approximate computation of outliers with quality guarantees , 2013, Inf. Syst..

[10]  Joshua Brody,et al.  Distributed monitoring of conditional entropy for anomaly detection in streams , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[11]  Chrisil Arackaparambil,et al.  Functional Monitoring without Monotonicity , 2009, ICALP.

[12]  Sergio Verdú,et al.  Convexity/concavity of renyi entropy and α-mutual information , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[13]  Assaf Schuster,et al.  Privacy-Preserving Distributed Stream Monitoring , 2014, NDSS.

[14]  Assaf Schuster,et al.  A geometric approach to monitoring threshold functions over distributed data streams , 2007, ACM Trans. Database Syst..

[15]  Assaf Schuster,et al.  Shape Sensitive Geometric Monitoring , 2012, IEEE Trans. Knowl. Data Eng..

[16]  Krzysztof Onak,et al.  Sketching and Streaming Entropy via Approximation Theory , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[17]  D. K. Lobiyal,et al.  Performance evaluation of data aggregation for cluster-based wireless sensor network , 2013, Human-centric Computing and Information Sciences.

[18]  Alejandro Zunino,et al.  An empirical comparison of botnet detection methods , 2014, Comput. Secur..

[19]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[20]  Assaf Schuster,et al.  Distributed Threshold Querying of General Functions by a Difference of Monotonic Representation , 2010, Proc. VLDB Endow..

[21]  Sumit Ganguly,et al.  Estimating Entropy over Data Streams , 2006, ESA.

[22]  Assaf Schuster,et al.  Communication-Efficient Distributed Online Prediction by Dynamic Model Synchronization , 2014, ECML/PKDD.

[23]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[24]  Assaf Schuster,et al.  One for All and All for One: Simultaneous Approximation of Multiple Functions over Distributed Streams , 2017, DEBS.

[25]  Daniel Keren,et al.  Sketch-based Geometric Monitoring of Distributed Stream Queries , 2013, Proc. VLDB Endow..

[26]  Ilkka Korhonen,et al.  Detection of Daily Activities and Sports With Wearable Sensors in Controlled and Uncontrolled Conditions , 2008, IEEE Transactions on Information Technology in Biomedicine.

[27]  Gal Yehuda,et al.  Monitoring Properties of Large, Distributed, Dynamic Graphs , 2017, IPDPS.

[28]  Graham Cormode,et al.  The continuous distributed monitoring model , 2013, SGMD.

[29]  Andreas Hoeft,et al.  Spectral Entropy and Bispectral Index as Measures of the Electroencephalographic Effects of Sevoflurane , 2004, Anesthesiology.

[30]  Pay-Liam Lin,et al.  Revised air quality index derived from an entropy function , 2004 .

[31]  Ashwin Lall,et al.  A data streaming algorithm for estimating entropies of od flows , 2007, IMC '07.

[32]  J. Bruhn,et al.  Depth of anaesthesia monitoring: what's available, what's validated and what's next? , 2006, British journal of anaesthesia.

[33]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[34]  Assaf Schuster,et al.  Prediction-based geometric monitoring over distributed data streams , 2012, SIGMOD Conference.

[35]  U. Rajendra Acharya,et al.  Entropies for detection of epilepsy in EEG , 2005, Comput. Methods Programs Biomed..

[36]  B. Pompe,et al.  Permutation entropy: a natural complexity measure for time series. , 2002, Physical review letters.

[37]  Qin Zhang,et al.  Continuous sampling from distributed streams , 2012, JACM.

[38]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[39]  Andrzej Drygajlo,et al.  Entropy based voice activity detection in very noisy conditions , 2001, INTERSPEECH.

[40]  Ran Wolff Distributed Convex Thresholding , 2015, PODC.

[41]  S. Muthukrishnan,et al.  Estimating Entropy and Entropy Norm on Data Streams , 2006, Internet Math..

[42]  Assaf Schuster,et al.  Monitoring Distributed Streams using Convex Decompositions , 2015, Proc. VLDB Endow..

[43]  Marcin Szpyrka,et al.  An Entropy-Based Network Anomaly Detection Method , 2015, Entropy.

[44]  Peter Clifford,et al.  A simple sketching algorithm for entropy estimation over streaming data , 2013, AISTATS.

[45]  Assaf Schuster,et al.  Distributed Geometric Query Monitoring Using Prediction Models , 2014, ACM Trans. Database Syst..

[46]  Paolo Grigolini,et al.  Scaling detection in time series: diffusion entropy analysis. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[47]  Vyas Sekar,et al.  An empirical evaluation of entropy-based traffic anomaly detection , 2008, IMC '08.

[48]  Assaf Schuster,et al.  Lightweight Monitoring of Distributed Streams , 2018, ACM Trans. Database Syst..