Stream Processing Systems Benchmark: StreamBench

Batch processing technologies (Such as MapReduce, Hive, Pig) have matured and been widely used in the industry. These systems solved the issue processing big volumes of data successfully. However, first big amount of data need to be collected and stored in a database or file system. That is very time-consuming. Then it takes time to finish batch processing analysis jobs before get any results. While there are many cases that need analysed results from unbounded sequence of data in seconds or sub-seconds. To satisfy the increasing demand of processing such streaming data, several streaming processing systems are implemented and widely adopted, such as Apache Storm, Apache Spark, IBM InfoSphere Streams, and Apache Flink. While how to evaluate stream processing systems before choosing one in production development is an open question. In this book, we introduce StreamBench, a benchmark framework to facilitate performance comparisons of stream processing systems. A key feature of the StreamBench framework is that it is extensible -- it supports easy definition of new workloads, in addition to making it easy to benchmark new stream processing systems.