Optimizing Storage System Design for Timeseries Processing

The increase in high-precision, high-sample-rate telemetry timeseries poses a problem for existing timeseries databases which can neither cope with the throughput demands of these streams nor provide the necessary primitives for effective analysis of them. We present a novel abstraction for telemetry timeseries data and a data structure for providing this abstraction: a timepartitioning version-annotated copy-on-write tree. An implementation in Go is shown to outperform existing solutions, demonstrating a throughput of 53 million inserted values per second and 119 million queried values per second on a four-node cluster. The system achieves a 2.9x compression ratio and satisfies statistical queries spanning a year of data in under 200ms, as demonstrated on a year-long production deployment storing 2.1 trillion data points. The principles and design of this database are generally applicable to a large variety of timeseries types and represent a significant advance in the development of technology for the Internet of Things.

[1]  Michael Stonebraker,et al.  The VoltDB Main Memory DBMS , 2013, IEEE Data Eng. Bull..

[2]  Tilmann Rabl,et al.  Solving Big Data Challenges for Enterprise Application Performance Management , 2012, Proc. VLDB Endow..

[3]  Ray Klump,et al.  Lossless compression of synchronized phasor measurements , 2010, IEEE PES General Meeting.

[4]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[5]  Heiko Koziolek,et al.  Scalability and Robustness of Time-Series Databases for Cloud-Native Monitoring of Industrial Processes , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[6]  G.G. Langdon,et al.  Data compression , 1988, IEEE Potentials.

[7]  Qi Huang,et al.  Gorilla: A Fast, Scalable, In-Memory Time Series Database , 2015, Proc. VLDB Endow..

[8]  Carlos Maltzahn,et al.  RADOS: a scalable, reliable storage service for petabyte-scale storage clusters , 2007, PDSW '07.

[9]  David E. Culler,et al.  DISTIL: Design and implementation of a scalable synchrophasor data processing system , 2015, 2015 IEEE International Conference on Smart Grid Communications (SmartGridComm).

[10]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[11]  Andrew Weaver,et al.  Free Lossless Audio Codec , 2019 .

[12]  Philip Top,et al.  Compressing Phasor Measurement data , 2013 .

[13]  Anthony Rowe,et al.  Respawn: A Distributed Multi-resolution Time-Series Datastore , 2013, 2013 IEEE 34th Real-Time Systems Symposium.