Overcoming memory limitations in high-throughput event-based applications

The last decade has witnessed the emergence of business critical applications processing streaming data for domains as diverse as credit card fraud detection, real-time recommendation systems, call-center monitoring, ad selection, network monitoring, and more. Most of those applications need to compute hundreds or thousands of metrics continuously while coping with very high event input rates. As a consequence, large amounts of state (i.e., moving windows) need to be maintained, very often exceeding the available memory resources. Nonetheless, current event processing platforms have little or no memory management capabilities, hanging or simply crashing when memory is exhausted. In this paper we report our experience in using secondary storage for solving the performance problems of memory-constrained event processing applications. For that, we propose SlideM, a novel buffer management algorithm that exploits the access pattern of sliding windows in order to efficiently handle memory shortages. The proposed algorithm was implemented in a real stream processing engine and validated through an extensive experimental performance evaluation. Results corroborate the efficacy of the approach: the system was able to sustain very high input rates (up to 300,000 events per second) for very large windows (about 30GB) while consuming small amounts of main memory (few kilobytes).

[1]  Annika Hinze,et al.  Event-based applications and enabling technologies , 2009, DEBS '09.

[2]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[3]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[4]  Paulo Marques,et al.  A Performance Study of Event Processing Systems , 2009, TPCTC.

[5]  Rajeev Rastogi,et al.  Processing complex aggregate queries over data streams , 2002, SIGMOD '02.

[6]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[7]  Jennifer Widom,et al.  Memory-Limited Execution of Windowed Stream Joins , 2004, VLDB.

[8]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[9]  Elke A. Rundensteiner,et al.  Run-time operator state spilling for memory intensive long-running queries , 2006, SIGMOD Conference.

[10]  David Maier,et al.  No pane, no gain: efficient evaluation of sliding-window aggregates over data streams , 2005, SGMD.

[11]  Michael J. Franklin,et al.  Shared query processing in data streaming systems , 2006 .

[12]  Moustafa A. Hammad,et al.  Adaptive Execution of Stream Window Joins in a Limited Memory Environment , 2007, 11th International Database Engineering and Applications Symposium (IDEAS 2007).

[13]  Jennifer Widom,et al.  Resource Sharing in Continuous Sliding-Window Aggregates , 2004, VLDB.

[14]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[15]  Ajit Singh,et al.  Processing Exact Results for Sliding Window Joins over Time-Sequence, Streaming Data Using a Disk Archive , 2009, 2009 First Asian Conference on Intelligent Information and Database Systems.

[16]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[17]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[18]  Benjamin Van Roy A short proof of optimality for the MIN cache replacement algorithm , 2007, Inf. Process. Lett..