Measuring Performance of Complex Event Processing Systems

Complex Event Processing (CEP) or stream data processing are becoming increasingly popular as the platform underlying event-driven solutions and applications in industries such as financial services, oil & gas, smart grids, health care, and IT monitoring. Satisfactory performance is crucial for any solution across these industries. Typically, performance of CEP engines is measured as (1) data rate, i.e., number of input events processed per second, and (2) latency, which denotes the time it takes for the result (output events) to emerge from the system after the business event (input event) happened. While data rates are typically easy to measure by capturing the numbers of input events over time, latency is less well defined. As it turns out, a definition becomes particularly challenging in the presence of data arriving out of order. That means that the order in which events arrive at the system is different from the order of their timestamps. Many important distributed scenarios need to deal with out-of-order arrival because communication delays easily introduce disorder. With out-of-order arrival, a CEP system cannot produce final answers as events arrive. Instead, time first needs to progress enough in the overall system before correct results can be produced. This introduces additional latency beyond the time it takes the system to perform the processing of the events. We denote the former as information latency and the latter as system latency. This paper discusses both types of latency in detail and defines them formally without depending on particular semantics of the CEP query plans. In addition, the paper suggests incorporating these definitions as metrics into the benchmarks that are being used to assess and compare CEP systems.

[1]  Theodore Johnson,et al.  Out-of-order processing: a new architecture for high-performance stream systems , 2008, Proc. VLDB Endow..

[2]  Badrish Chandramouli,et al.  Accurate latency estimation in a distributed event processing system , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[3]  Bernhard Seeger,et al.  A Cost-Based Approach to Adaptive Resource Management in Data Stream Systems , 2008, IEEE Transactions on Knowledge and Data Engineering.

[4]  Rajeev Motwani,et al.  Chain: operator scheduling for memory minimization in data stream systems , 2003, SIGMOD '03.

[5]  Mikael Berndtsson,et al.  Performance Evaluation of Object-Oriented Active Database Systems Using the BEAST Benchmark , 1998, Theory Pract. Object Syst..

[6]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[7]  Michael Stonebraker,et al.  Linear Road: A Stream Data Management Benchmark , 2004, VLDB.

[8]  Samuel Kounev,et al.  Performance evaluation of message-oriented middleware using the SPECjms2007 benchmark , 2009, Perform. Evaluation.

[9]  Richard M. Fujimoto Parallel simulation: distributed simulation systems , 2003, WSC '03.

[10]  Jennifer Widom,et al.  Tracing the lineage of view data in a warehousing environment , 2000, TODS.

[11]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[12]  Richard M. Fujimoto Distributed simulation systems , 2003, Proceedings of the 2003 Winter Simulation Conference, 2003..

[13]  Jonathan Goldstein,et al.  Consistent Streaming Through Time: A Vision for Event Stream Processing , 2006, CIDR.

[14]  Ying Li,et al.  Microsoft CEP Server and Online Behavioral Targeting , 2009, Proc. VLDB Endow..