Load shedding in stream databases: a control-based approach

In Data Stream Management Systems (DSMSs), query processing has to meet various Quality-of-Service (QoS) requirements. In many data stream applications, processing delay is the most critical quality requirement since the value of query results decreases dramatically over time. The ability to remain within a desired level of delay is significantly hampered under situations of overloading, which are common in data stream systems. When overloaded, DSMSs employ load shedding in order to meet quality requirements and keep pace with the high rate of data arrivals. Data stream applications are extremely dynamic due to bursty data arrivals and time-varying data processing costs. Current approaches ignore system status information in decision-making and consequently are unable to achieve desired control of quality under dynamic load. In this paper, we present a quality management framework that leverages well studied feedback control techniques. We discuss the design and implementation of such a framework in a real DSMS - the Borealis stream manager. Our data management framework is built on the advantages of system identification and rigorous controller analysis. Experimental results show that our solution achieves significantly fewer QoS (delay) violations with the same or lower level of data loss, as compared to current strategies utilized in DSMSs. It is also robust and bears negligible computational overhead.

[1]  Christos Faloutsos,et al.  Data mining meets performance evaluation: fast algorithms for modeling bursty traffic , 2002, Proceedings 18th International Conference on Data Engineering.

[2]  Michael Stonebraker,et al.  Operator Scheduling in a Data Stream Manager , 2003, VLDB.

[3]  Stephen A. Dyer,et al.  Digital signal processing , 2018, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[4]  Joseph L. Hellerstein,et al.  Control Considerations for Scalable Event Processing , 2005, DSOM.

[5]  Srinivasan Keshav,et al.  A control-theoretic approach to flow control , 1991, SIGCOMM '91.

[6]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[7]  Rajeev Rastogi,et al.  Processing complex aggregate queries over data streams , 2002, SIGMOD '02.

[8]  Frederick Reiss,et al.  Declarative Network Monitoring with an Underprovisioned Query Processor , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[9]  Rae Baxter,et al.  Acknowledgments.-The authors would like to , 1982 .

[10]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[11]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[12]  Klara Nahrstedt,et al.  A control-based middleware framework for quality-of-service adaptations , 1999, IEEE J. Sel. Areas Commun..

[13]  Jeffrey F. Naughton,et al.  Rate-based query optimization for streaming information sources , 2002, SIGMOD '02.

[14]  Premkumar T. Devanbu,et al.  Resource Management , 2000, EDO.

[15]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[16]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[17]  Jennifer Widom,et al.  Query Processing, Resource Management, and Approximation ina Data Stream Management System , 2002 .

[18]  Rajeev Motwani,et al.  Chain: operator scheduling for memory minimization in data stream systems , 2003, SIGMOD '03.

[19]  Frederick Reiss,et al.  Data Triage: an adaptive architecture for load shedding in TelegraphCQ , 2005, 21st International Conference on Data Engineering (ICDE'05).

[20]  Walid G. Aref,et al.  Scheduling for shared window joins over data streams , 2003, VLDB.

[21]  Song Liu,et al.  Control-Based Quality Adaptation in Data Stream Management Systems , 2005, DEXA.

[22]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[23]  Sally Floyd,et al.  Wide-area traffic: the failure of Poisson modeling , 1994 .

[24]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[25]  Klara Nahrstedt,et al.  QoS-aware middleware for ubiquitous and heterogeneous environments , 2001, IEEE Commun. Mag..

[26]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[27]  Philip S. Yu,et al.  Loadstar: A Load Shedding Scheme for Classifying Data Streams , 2005, SDM.

[28]  Sang Hyuk Son,et al.  Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms* , 2001, Real-Time Systems.

[29]  Mor Harchol-Balter,et al.  On Choosing a Task Assignment Policy for a Distributed Server System , 1998, J. Parallel Distributed Comput..

[30]  Gene F. Franklin,et al.  Feedback Control of Dynamic Systems , 1986 .

[31]  Rajeev Motwani,et al.  Load shedding for aggregation queries over data streams , 2004, Proceedings. 20th International Conference on Data Engineering.

[32]  Klara Nahrstedt,et al.  Resource Management in Networked Multimedia Systems , 1995, Computer.

[33]  Jennifer Widom,et al.  Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[34]  Michael Stonebraker,et al.  Aurora: a new model and architecture for data stream management , 2003, The VLDB Journal.

[35]  Sang Hyuk Son,et al.  Managing deadline miss ratio and sensor data freshness in real-time databases , 2004, IEEE Transactions on Knowledge and Data Engineering.

[36]  David K. Y. Yau,et al.  Operating System Techniques for Distributed Multimedia , 1996 .