ESPBench: The Enterprise Stream Processing Benchmark

Growing data volumes and velocities in fields such as Industry 4.0 or the Internet of Things have led to the increased popularity of data stream processing systems. Enterprises can leverage these developments by enriching their core business data and analyses with up-to-date streaming data. Comparing streaming architectures for these complex use cases is challenging, as existing benchmarks do not cover them. ESPBench is a new enterprise stream processing benchmark that fills this gap. We present its architecture, the benchmarking process, and the query workload. We employ ESPBench on three state-of-the-art stream processing systems, Apache Spark, Apache Flink, and Hazelcast Jet, using provided query implementations developed with Apache Beam. Our results highlight the need for the provided ESPBench toolkit that supports benchmark execution, as it enables query result validation and objective latency measures.

[1]  Guenter Hesse,et al.  Conceptual Survey on Data Stream Processing Systems , 2015, 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS).

[2]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[3]  Christoph Matthies,et al.  Adding Value by Combining Business and Sensor Data: An Industry 4.0 Use Case , 2019, DASFAA.

[4]  Paulo Marques,et al.  A Performance Study of Event Processing Systems , 2009, TPCTC.

[5]  Marlon Dumas,et al.  UML Activity Diagrams as a Workflow Specification Language , 2001, UML.

[6]  T. Rabl,et al.  How Fast Can We Insert? A Performance Study of Apache Kafka , 2020, arXiv.org.

[7]  Axel-Cyrille Ngonga Ngomo,et al.  Big data architecture for the semantic analysis of complex events in manufacturing , 2016, GI-Jahrestagung.

[8]  Daniel M. Dias,et al.  A modeling study of the TPC-C benchmark , 1993, SIGMOD '93.

[9]  Jeroen H. M. Janssens,et al.  Outlier selection and one-class classification , 2013 .

[10]  Laura M. Haas,et al.  SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems , 2010, Proc. VLDB Endow..

[11]  Christoph Matthies,et al.  Senska - Towards an Enterprise Streaming Benchmark , 2017, TPCTC.

[12]  Peter Tabeling,et al.  Fundamental Modeling Concepts: Effective Communication of It Systems , 2006 .

[13]  Jeyhun Karimov,et al.  Benchmarking Distributed Stream Data Processing Systems , 2019, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[14]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[15]  Marcelo Rodrigues Nunes Mendes Performance Evaluation and Benchmarking of Event Processing Systems , 2014 .

[16]  Dirk Van den Poel,et al.  Evaluation of Stream Processing Frameworks , 2020, IEEE Transactions on Parallel and Distributed Systems.

[17]  Jóakim von Kistowski,et al.  How to Build a Benchmark , 2015, ICPE.

[18]  Michael Stonebraker,et al.  Linear Road: A Stream Data Management Benchmark , 2004, VLDB.

[19]  Michael Stonebraker,et al.  The design of POSTGRES , 1986, SIGMOD '86.

[20]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[21]  Yogesh L. Simmhan,et al.  RIoTBench: An IoT benchmark for distributed stream processing systems , 2017, Concurr. Comput. Pract. Exp..

[22]  Michael Stonebraker,et al.  The 8 requirements of real-time stream processing , 2005, SGMD.

[23]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[24]  Hasso Plattner,et al.  A common database approach for OLTP and OLAP using an in-memory column database , 2009, SIGMOD Conference.

[25]  Christoph Matthies,et al.  Quantitative Impact Evaluation of an Abstraction Layer for Data Stream Processing Systems , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[26]  Zhuo Liu,et al.  Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[27]  Jim Gray,et al.  Benchmark Handbook: For Database and Transaction Processing Systems , 1992 .

[28]  Adel Taweel,et al.  Open Source In-Memory Data Grid Systems: Benchmarking Hazelcast and Infinispan , 2017, ICPE.

[29]  Christoph Matthies,et al.  A New Application Benchmark for Data Stream Processing Architectures in an Enterprise Context: Doctoral Symposium , 2017, DEBS.

[30]  J. Popp,et al.  The Role and Impact of Industry 4.0 and the Internet of Things on the Business Strategy of the Value Chain—The Case of Hungary , 2018, Sustainability.

[31]  Christoph Matthies,et al.  Application of Data Stream Processing Technologies in Industry 4.0 - What is Missing? , 2019, DATA.

[32]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[33]  Karl Huppler,et al.  The Art of Building a Good Benchmark , 2009, TPCTC.