Monarch

Monarch is a globally-distributed in-memory time series database system in Google. Monarch runs as a multi-tenant service and is used mostly to monitor the availability, correctness, performance, load, and other aspects of billion-userscale applications and systems at Google. Every second, the system ingests terabytes of time series data into memory and serves millions of queries. Monarch has a regionalized architecture for reliability and scalability, and global query and configuration planes that integrate the regions into a unified system. On top of its distributed architecture, Monarch has flexible configuration, an expressive relational data model, and powerful queries. This paper describes the structure of the system and the novel mechanisms that achieve a reliable and flexible unified system on a regionalized distributed architecture. We also share important lessons learned from a decade’s experience of developing and running Monarch as a service in Google. PVLDB Reference Format: Colin Adams, Luis Alonso, Benjamin Atkin, John Banning, Sumeer Bhola, Rick Buskens, Ming Chen, Xi Chen, Yoo Chung, Qin Jia, Nick Sakharov, George Talbot, Adam Tart, Nick Taylor. Monarch: Google’s Planet-Scale In-Memory Time Series Database. PVLDB, 13(12): 3181-3194, 2020. DOI: https://doi.org/10.14778/3181-3194

[1]  Nancy A. Lynch,et al.  Perspectives on the CAP Theorem , 2012, Computer.

[2]  Torben Bach Pedersen,et al.  ModelarDB: Modular Model-Based Time Series Management with Spark and Cassandra , 2018, Proc. VLDB Endow..

[3]  Oliver Kopp,et al.  Survey and Comparison of Open Source Time Series Databases , 2017, BTW.

[4]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[5]  Jian Li,et al.  RACE: time series compression with rate adaptivity and error bound for sensor networks , 2004, 2004 IEEE International Conference on Mobile Ad-hoc and Sensor Systems (IEEE Cat. No.04EX975).

[6]  Torben Bach Pedersen,et al.  Time Series Management Systems: A Survey , 2017, IEEE Transactions on Knowledge and Data Engineering.

[7]  Luca Deri,et al.  tsdb: A Compressed Database for Time Series , 2012, TMA.

[8]  Stanley B. Zdonik,et al.  Window-aware load shedding for aggregation queries over data streams , 2006, VLDB.

[9]  Jon Howell,et al.  Slicer: Auto-Sharding for Datacenter Applications , 2016, OSDI.

[10]  Themis Palpanas,et al.  Coconut: A Scalable Bottom-Up Approach for Building Data Series Indexes , 2018, Proc. VLDB Endow..

[11]  Donald Beaver,et al.  Dapper, a Large-Scale Distributed Systems Tracing Infrastructure , 2010 .

[12]  P. Manimaran,et al.  Modelling Financial Time Series , 2006 .

[13]  GhemawatSanjay,et al.  The Google file system , 2003 .

[14]  Tomasz Wiktor Wlodarczyk Overview of Time Series Storage and Processing in a Cloud Environment , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[15]  Nuno Constantino Castro,et al.  Time Series Data Mining , 2009, Encyclopedia of Database Systems.

[16]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[17]  Qi Huang,et al.  Gorilla: A Fast, Scalable, In-Memory Time Series Database , 2015, Proc. VLDB Endow..

[18]  David E. Culler,et al.  BTrDB: Optimizing Storage System Design for Timeseries Processing , 2016, FAST.

[19]  Arif Merchant Keynote Address {II}: Optimal Flash Partitioning for Storage Workloads in Google's Colossus File System , 2014 .

[20]  Yannis E. Ioannidis,et al.  Universality of Serial Histograms , 1993, VLDB.

[21]  K. Hipel,et al.  Time series modelling of water resources and environmental systems , 1994 .

[22]  Jamie Wilkinson,et al.  Practical Alerting from Time-Series Data , 2016 .

[23]  E. Brewer,et al.  CAP twelve years later: How the "rules" have changed , 2012, Computer.

[24]  Sharad Mehrotra,et al.  Capturing sensor-generated time series with quality guarantees , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[25]  Eric Brewer,et al.  Spanner, TrueTime and the CAP Theorem , 2017 .

[26]  Ying Wah Teh,et al.  Time-series clustering - A decade review , 2015, Inf. Syst..

[27]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[28]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[29]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[30]  James C. Corbett,et al.  Spanner , 2013 .

[31]  Samuel Madden,et al.  Sprintz , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[32]  Volker Markl,et al.  M4: A Visualization-Oriented Time Series Data Aggregation , 2014, Proc. VLDB Endow..

[33]  Lars George,et al.  HBase: The Definitive Guide , 2011 .

[34]  Stanley B. Zdonik,et al.  Data Ingestion for the Connected World , 2017, CIDR.

[35]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[36]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[37]  Katia Obraczka,et al.  In-network aggregation trade-offs for data collection in wireless sensor networks , 2006, Int. J. Sens. Networks.

[38]  Ranveer Chandra,et al.  WearDrive: Fast and Energy-Efficient Storage for Wearables , 2015, USENIX Annual Technical Conference.

[39]  Goetz Graefe,et al.  F1 Query: Declarative Querying at Scale , 2018, Proc. VLDB Endow..