A Performance Analysis of System S, S4, and Esper via Two Level Benchmarking

Data stream processing systems have become popular due to their effectiveness in applications in large scale data stream processing scenarios. This paper compares and contrasts performance characteristics of three stream processing softwares System S, S4, and Esper. We study about which software aspects shape the characteristics of the workloads handled by these software. We use a micro benchmark and different real world stream applications on System S, S4, and Esper to construct 70 different application scenarios. We use job throughput, CPU, Memory consumption, and network utilization of each application scenario as performance metrics. We observed that S4's architectural aspect which instantiates a Processing Element (PE) for each keyed attribute is less efficient compared to the fixed number of PEs used by System S and Esper. Furthermore, all the Esper benchmarks produced more than 150% increased performance in single node compared to S4 benchmarks. S4 and Esper are more portable compared to System S and could be fine tuned for different application scenarios easily. In future we hope to widen our understanding of performance characteristics of these systems by investigating in to the code level profiling.

[1]  Michael Stonebraker,et al.  Linear Road: A Stream Data Management Benchmark , 2004, VLDB.

[2]  Kun-Lung Wu,et al.  Characterizing, constructing and managing resource usage profiles of system S applications: challenges and experience , 2009, CIKM.

[3]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[4]  Rob Davies,et al.  ActiveMQ in Action , 2011 .

[5]  Deepak S. Turaga,et al.  Design principles for developing stream processing applications , 2010 .

[6]  Philip S. Yu,et al.  Scale-Up Strategies for Processing High-Rate Data Streams in System S , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[7]  Toshiaki Yasue,et al.  Scalable performance of system S for extract-transform-load processing , 2010, SYSTOR '10.

[8]  Lieven Eeckhout,et al.  Performance Evaluation and Benchmarking , 2005 .

[9]  Tore Risch,et al.  Scalable Splitting of Massive Data Streams , 2010, DASFAA.

[10]  Opher Etzion,et al.  Event Processing in Action , 2010 .

[11]  Kun-Lung Wu,et al.  Workload characterization for operator-based distributed stream processing applications , 2010, DEBS '10.

[12]  Paulo Marques,et al.  A Performance Study of Event Processing Systems , 2009, TPCTC.

[13]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[14]  Kun-Lung Wu,et al.  SODA: An Optimizing Scheduler for Large-Scale Stream-Based Distributed Computer Systems , 2008, Middleware.

[15]  Bugra Gedik,et al.  A model‐based framework for building extensible, high performance stream processing middleware and programming language for IBM InfoSphere Streams , 2012, Softw. Pract. Exp..

[16]  Michael Stonebraker,et al.  Aurora: a new model and architecture for data stream management , 2003, The VLDB Journal.