Load-aware shedding in stream processing systems

Load shedding is a technique employed by stream processing systems to handle unpredictable spikes in the input load whenever available computing resources are not adequately provisioned. A load shedder drops tuples to keep the input load below a critical threshold and thus avoid tuple queuing and system trashing. In this paper we propose Load-Aware Shedding (LAS), a novel load shedding solution that drops tuples with the aim of maintaining queuing times below a tunable threshold. Tuple execution durations are estimated at runtime using efficient sketch data structures. We provide a theoretical analysis proving that LAS is an (ε, δ)-approximation of the optimal online load shedder and show its performance through a practical evaluation based both on simulations and on a running prototype.

[1]  WeiWei A Novel Adaptive Load Shedding Scheme for Data Stream Processing , 2008 .

[2]  Bruno Sericola,et al.  Proactive Online Scheduling for Shuffle Grouping in Distributed Stream Processing Systems , 2015 .

[3]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[4]  Rajeev Motwani,et al.  Load shedding for aggregation queries over data streams , 2004, Proceedings. 20th International Conference on Data Engineering.

[5]  Stanley B. Zdonik,et al.  Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing , 2007, VLDB.

[6]  Vldb Endowment,et al.  The VLDB journal : the international journal on very large data bases. , 1992 .

[7]  Jeffrey F. Naughton,et al.  On Load Shedding in Complex Event Processing , 2013, ICDT.

[8]  Themistoklis Charalambous,et al.  Overload Management in Data Stream Processing Systems with Latency Guarantees , 2012, ICAC 2012.

[9]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[10]  Song Liu,et al.  Load shedding in stream databases: a control-based approach , 2006, VLDB.

[11]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[12]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[13]  Thomas S. Heinze,et al.  Cloud-based data stream processing , 2014, DEBS '14.

[14]  Frederick Reiss,et al.  Data Triage: an adaptive architecture for load shedding in TelegraphCQ , 2005, 21st International Conference on Data Engineering (ICDE'05).