One for All and All for One: Simultaneous Approximation of Multiple Functions over Distributed Streams

Distributed monitoring methods address the difficult problem of continuously approximating functions over distributed streams, while minimizing the communication cost. However, existing methods are concerned with the approximation of a single function at a time. Employing these methods to track multiple functions will multiply the communication volume, thus eliminating their advantage in the first place. We introduce a novel approach that can be applied to multiple functions. Our method applies a communication reduction scheme to the set of functions, rather than to each function independently, keeping a low communication volume. Evaluation on several real-world datasets shows that our method can track many functions with reduced communication, in most cases incurring only a negligible increase in communication over distributed approximation of a single function.

[1]  Ion Stoica,et al.  Sharing aggregate computation for distributed queries , 2007, SIGMOD '07.

[2]  Emanuele Viola,et al.  Pseudorandom Bits for Polynomials , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[3]  Krithi Ramamritham,et al.  Handling Non-linear Polynomial Queries over Dynamic Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[4]  Assaf Schuster,et al.  Communication-efficient Outlier Detection for Scale-out Systems , 2013, BD3@VLDB.

[5]  Gerhard Weikum,et al.  KLEE: A Framework for Distributed Top-k Query Algorithms , 2005, VLDB.

[6]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[7]  Assaf Schuster,et al.  Communication-Efficient Distributed Variance Monitoring and Outlier Detection for Multivariate Time Series , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[8]  Charlotte H. Mason,et al.  Collinearity, power, and interpretation of multiple regression analysis. , 1991 .

[9]  Assaf Schuster,et al.  Prediction-based geometric monitoring over distributed data streams , 2012, SIGMOD Conference.

[10]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[11]  Graham Cormode,et al.  Communication-efficient distributed monitoring of thresholded counts , 2006, SIGMOD Conference.

[12]  Noga Alon,et al.  Tracking join and self-join sizes in limited storage , 1999, PODS '99.

[13]  Amir Abboud,et al.  Geometric Monitoring of Heterogeneous Streams , 2014, IEEE Transactions on Knowledge and Data Engineering.

[14]  Assaf Schuster,et al.  Shape Sensitive Geometric Monitoring , 2012, IEEE Trans. Knowl. Data Eng..

[15]  K. Fan On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations: II. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Ran Wolff,et al.  A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems , 2009, IEEE Transactions on Knowledge and Data Engineering.

[17]  C. Guestrin,et al.  Distributed regression: an efficient framework for modeling sensor network data , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[18]  Peng Wang,et al.  An Efficient Approach of Processing Multiple Continuous Queries , 2016, Journal of Computer Science and Technology.

[19]  Minos N. Garofalakis,et al.  Scalable Approximate Query Tracking over Highly Distributed Data Streams , 2016, SIGMOD Conference.

[20]  Assaf Schuster,et al.  A geometric approach to monitoring threshold functions over distributed data streams , 2006, Ubiquitous Knowledge Discovery.

[21]  Dimitrios Gunopulos,et al.  Distributed deviation detection in sensor networks , 2003, SGMD.

[22]  Willem H. Haemers,et al.  Spectra of Graphs , 2011 .

[23]  Assaf Schuster,et al.  Monitoring Least Squares Models of Distributed Streams , 2015, KDD.

[24]  Daniel Keren,et al.  Sketch-based Geometric Monitoring of Distributed Stream Queries , 2013, Proc. VLDB Endow..

[25]  Graham Cormode,et al.  Sketch Techniques for Approximate Query Processing , 2010 .

[26]  Assaf Schuster,et al.  Distributed Threshold Querying of General Functions by a Difference of Monotonic Representation , 2010, Proc. VLDB Endow..

[27]  Graham Cormode,et al.  Algorithms for distributed functional monitoring , 2008, SODA '08.

[28]  Assaf Schuster,et al.  Communication-Efficient Distributed Online Prediction by Dynamic Model Synchronization , 2014, ECML/PKDD.

[29]  Antonios Deligiannakis,et al.  Detecting Outliers in Sensor Networks Using the Geometric Approach , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[30]  Jennifer Widom,et al.  Resource Sharing in Continuous Sliding-Window Aggregates , 2004, VLDB.

[31]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[32]  Graham Cormode,et al.  Approximate continuous querying over distributed streams , 2008, TODS.

[33]  Mohamed A. Sharaf,et al.  Optimized processing of multiple aggregate continuous queries , 2011, CIKM '11.

[34]  Assaf Schuster,et al.  Aggregate Threshold Queries in Sensor Networks , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[35]  Assaf Schuster,et al.  Lightweight Monitoring of Distributed Streams , 2018, ACM Trans. Database Syst..

[36]  Chrisil Arackaparambil,et al.  Functional Monitoring without Monotonicity , 2009, ICALP.

[37]  K. Fan On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations I. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Assaf Schuster,et al.  Privacy-Preserving Distributed Stream Monitoring , 2014, NDSS.

[39]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[40]  Assaf Schuster,et al.  Anarchists, Unite: Practical Entropy Approximation for Distributed Streams , 2017, KDD.

[41]  Qin Zhang,et al.  Optimal tracking of distributed heavy hitters and quantiles , 2009, PODS.

[42]  Ran Wolff,et al.  Hierarchical decision tree induction in distributed genomic databases , 2005, IEEE Transactions on Knowledge and Data Engineering.

[43]  Graham Cormode,et al.  The continuous distributed monitoring model , 2013, SGMD.

[44]  SchusterAssaf,et al.  A geometric approach to monitoring threshold functions over distributed data streams , 2007 .

[45]  Assaf Schuster,et al.  Distributed Geometric Query Monitoring Using Prediction Models , 2014, ACM Trans. Database Syst..

[46]  Pushpraj Shukla,et al.  Efficient Constraint Monitoring Using Adaptive Thresholds , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[47]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[48]  Rui Wang,et al.  Towards social user profiling: unified and discriminative influence model for inferring home locations , 2012, KDD.

[49]  Assaf Schuster,et al.  Shape Sensitive Geometric Monitoring , 2008, IEEE Transactions on Knowledge and Data Engineering.

[50]  Minos N. Garofalakis,et al.  Distributed Query Monitoring through Convex Analysis: Towards Composable Safe Zones , 2017, ICDT.

[51]  A. Goriely,et al.  Component retention in principal component analysis with application to cDNA microarray data , 2007, Biology Direct.

[52]  Ling Huang,et al.  Communication-Efficient Online Detection of Network-Wide Anomalies , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[53]  Assaf Schuster,et al.  Monitoring Distributed Streams using Convex Decompositions , 2015, Proc. VLDB Endow..

[54]  Ausra Saudargiene Structurization of the Covariance Matrix by Process Type and Block-Diagonal Models in the Classifier Design , 1999, Informatica.

[55]  Ran Wolff,et al.  Mining for misconfigured machines in grid systems , 2006, KDD '06.

[56]  Rajeev Rastogi,et al.  Sketch-Based Multi-Query Processing over Data Streams , 2004, Data Stream Management.