论文信息 - TS-Benchmark: A Benchmark for Time Series Databases

TS-Benchmark: A Benchmark for Time Series Databases

Time series data is widely used in scenarios such as supply chain, stock data analysis, and smart manufacturing. A number of time series database systems have been invented to manage and query large volumes of time series data. We observe that the existing benchmarks of time series databases are focused on workloads of complex analysis such as pattern matching and trend prediction whose performance may be highly affected by the data analysis algorithms, instead of the back-end databases. However, in many real applications of time series databases, people are more interested in the performance metrics such as data injection throughput and query processing time. A benchmark is still required to extensively compare the performance of time series databases in such metrics. We introduce such a benchmark called TS-Benchmark which majorly applies a scenario of device monitoring for wind turbines. A DCGAN-based data generation model is proposed to generate large volumes of time series data from some real time series data. The workloads are categorized into three folds: data loading (in batch), streaming data injection, and historical data access (for typical queries). We implement the benchmark and compare four representative time series databases: InfluxDB, TimescaleDB, Druid and OpenTSDB. The results are reported and analyzed.

[1] Patrick E. O'Neil,et al. The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[2] Guoqiang Peter Zhang,et al. Time series forecasting using a hybrid ARIMA and neural network model , 2003, Neurocomputing.

[3] Yogesh L. Simmhan,et al. RIoTBench: An IoT benchmark for distributed stream processing systems , 2017, Concurr. Comput. Pract. Exp..

[4] Ajit Pratap Kundan,et al. Grafana , 2021, Monitoring Cloud-Native Applications.

[5] Daniel Lemire,et al. Optimizing Druid with Roaring bitmaps , 2016, IDEAS.

[6] Wilson C. Hsieh,et al. Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[7] Jie Huang,et al. The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[8] Gang Wu,et al. Stream Bench: Towards Benchmarking Modern Distributed Stream Computing Frameworks , 2014, 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing.

[9] Wolfgang Lehner,et al. Feature-based comparison and generation of time series , 2018, SSDBM.

[10] Qi Huang,et al. Gorilla: A Fast, Scalable, In-Memory Time Series Database , 2015, Proc. VLDB Endow..

[11] Meikel Pöss,et al. New TPC benchmarks for decision support and web commerce , 2000, SGMD.

[12] Torben Bach Pedersen,et al. Time Series Management Systems: A Survey , 2017, IEEE Transactions on Knowledge and Data Engineering.

[13] Deep Ganguli,et al. Druid: a real-time analytical data store , 2014, SIGMOD Conference.

[14] Michael Stonebraker,et al. Linear Road: A Stream Data Management Benchmark , 2004, VLDB.

[15] Tomas E. Ward,et al. Synthesis of Realistic ECG using Generative Adversarial Networks , 2019, ArXiv.

[16] Manish Marwah,et al. IoTAbench: an Internet of Things Analytics Benchmark , 2015, ICPE.

[17] Xiaoyong Du,et al. Which Category Is Better: Benchmarking Relational and Graph Database Management Systems , 2019, Data Science and Engineering.

[18] Donald Kossmann,et al. Analytics in Motion: High Performance Event-Processing AND Real-Time Analytics in the Same Database , 2015, SIGMOD Conference.

[19] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[20] Cosimo Anglano,et al. Prometheus: A flexible toolkit for the experimentation with virtualized infrastructures , 2018, Concurr. Comput. Pract. Exp..

[21] Kai Zhang,et al. Forecasting with prediction intervals for periodic autoregressive moving average models , 2013, Journal of time series analysis.

[22] Oliver Kopp,et al. Survey and Comparison of Open Source Time Series Databases , 2017, BTW.

[23] Hans-Peter Fröschle. DevOps , 2017, HMD Praxis der Wirtschaftsinformatik.

[24] GhemawatSanjay,et al. The Google file system , 2003 .

[25] Xiufeng Liu,et al. A Scalable Smart Meter Data Generator Using Spark , 2017, OTM Conferences.

[26] Yuchen Fu,et al. ECG Generation With Sequence Generative Adversarial Nets Optimized by Policy Gradient , 2019, IEEE Access.