Optimal load shedding with aggregates and mining queries

To cope with bursty arrivals of high-volume data, a DSMS has to shed load while minimizing the degradation of Quality of Service (QoS). In this paper, we show that this problem can be formalized as a classical optimization task from operations research, in ways that accommodate different requirements for multiple users, different query sensitivities to load shedding, and different penalty functions. Standard non-linear programming algorithms are adequate for non-critical situations, but for severe overloads, we propose a more efficient algorithm that runs in linear time, without compromising optimality. Our approach is applicable to a large class of queries including traditional SQL aggregates, statistical aggregates (e.g., quantiles), and data mining functions, such as k-means, naive Bayesian classifiers, decision trees, and frequent pattern discovery (where we can even specify a different error bound for each pattern). In fact, we show that these aggregate queries are special instances of a broader class of functions, that we call reciprocal-error aggregates, for which the proposed methods apply with full generality. Finally, we propose a novel architecture for supporting load shedding in an extensible system, where users can write arbitrary User Defined Aggregates (UDA), and thus confirm our analytical findings with several experiments executed on an actual DSMS.

[1]  P. Sen On Some Properties of the Asymptotic Variance of the Sample Quantiles and Mid‐Ranges , 1961 .

[2]  Toshihide Ibaraki,et al.  Resource allocation problems - algorithmic approaches , 1988, MIT Press series in the foundations of computing.

[3]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[4]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[5]  Carlo Zaniolo,et al.  ATLAS: A Small but Complete SQL Extension for Data Mining and Data Streams , 2003, VLDB.

[6]  Michael Stonebraker,et al.  Aurora: a data stream management system , 2003, SIGMOD '03.

[7]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[8]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[9]  Rajeev Motwani,et al.  Load shedding for aggregation queries over data streams , 2004, Proceedings. 20th International Conference on Data Engineering.

[10]  Philip S. Yu,et al.  Adaptive load shedding for windowed stream joins , 2005, CIKM '05.

[11]  David Maier,et al.  No pane, no gain: efficient evaluation of sliding-window aggregates over data streams , 2005, SGMD.

[12]  Theodore Johnson,et al.  Sampling algorithms in a stream operator , 2005, SIGMOD '05.

[13]  Philip S. Yu,et al.  Loadstar: A Load Shedding Scheme for Classifying Data Streams , 2005, SDM.

[14]  Carlo Zaniolo,et al.  A data stream language and system designed for power and extensibility , 2006, CIKM '06.

[15]  Stanley B. Zdonik,et al.  Window-aware load shedding for aggregation queries over data streams , 2006, VLDB.

[16]  Song Liu,et al.  Load shedding in stream databases: a control-based approach , 2006, VLDB.

[17]  Philip S. Yu,et al.  A Load Shedding Framework and Optimizations for M-way Windowed Stream Joins , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[18]  Carlo Zaniolo,et al.  Load Shedding for Window Joins on Multiple Data Streams , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[19]  Carlo Zaniolo,et al.  Improving the accuracy of continuous aggregates and mining queries on data streams under load shedding , 2008, Int. J. Bus. Intell. Data Min..

[20]  Feifei Li,et al.  Randomized Synopses for Query Assurance on Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[21]  Carlo Zaniolo,et al.  Verifying and Mining Frequent Patterns from Large Windows over Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[22]  Carlo Zaniolo,et al.  A Data Stream Mining System , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[23]  Philip S. Yu,et al.  Efficient Construction of Compact Shedding Filters for Data Stream Processing , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[24]  Carlo Zaniolo,et al.  Designing an inductive data stream management system: the stream mill experience , 2008, SSPS '08.

[25]  Philip S. Yu,et al.  MobiQual: QoS-aware Load Shedding in Mobile CQ Systems , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[26]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.