Building An Elastic Query Engine on Disaggregated Storage

We present operational experience running Snowflake, a cloudbased data warehousing system with SQL support similar to state-of-the-art databases. Snowflake design is motivated by three goals: (1) compute and storage elasticity; (2) support for multi-tenancy; and, (3) high performance. Over the last few years, Snowflake has grown to serve thousands of customers executing millions of queries on petabytes of data every day. We discuss Snowflake design with a particular focus on ephemeral storage system design, query scheduling, elasticity and efficiently supporting multi-tenancy. Using statistics collected during execution of 70 million queries over a 14 day period, our study highlights how recent changes in cloud infrastructure have altered the many assumptions that guided the design and optimization of Snowflake, and outlines several interesting avenues of future research.

[1]  Mengyuan Li,et al.  Peeking Behind the Curtains of Serverless Platforms , 2018, USENIX Annual Technical Conference.

[2]  Christoforos E. Kozyrakis,et al.  Flash storage disaggregation , 2016, EuroSys.

[3]  David J. DeWitt,et al.  Weaving Relations for Cache Performance , 2001, VLDB.

[4]  Scott Shenker,et al.  Network Requirements for Resource Disaggregation , 2016, OSDI.

[5]  Srikanth Kandula,et al.  PACMan: Coordinated Memory Caching for Parallel Jobs , 2012, NSDI.

[6]  David R. Karger,et al.  New Algorithms for Load Balancing in Peer-to-Peer Systems , 2003 .

[7]  Panos Vassiliadis,et al.  A Survey of Extract-Transform-Load Technology , 2009, Int. J. Data Warehous. Min..

[8]  Christoforos E. Kozyrakis,et al.  Pocket: Elastic Ephemeral Storage for Serverless Analytics , 2018, OSDI.

[9]  Christoforos E. Kozyrakis,et al.  ReFlex: Remote Flash ≈ Local Flash , 2017, ASPLOS.

[10]  Marcos K. Aguilera,et al.  Remote memory in the age of fast networks , 2017, SoCC.

[11]  Xi Yang,et al.  Elfen Scheduling: Fine-Grain Principled Borrowing from Latency-Critical Workloads Using Simultaneous Multithreading , 2016, USENIX Annual Technical Conference.

[12]  Ashish Motivala,et al.  The Snowflake Elastic Data Warehouse , 2016, SIGMOD Conference.

[13]  Xiao Zhang,et al.  CPI2: CPU performance isolation for shared compute clusters , 2013, EuroSys '13.

[14]  Anurag Gupta,et al.  Amazon Redshift and the Case for Simpler Data Warehouses , 2015, SIGMOD Conference.

[15]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[16]  Andrey Gubarev,et al.  Dremel : Interactive Analysis of Web-Scale Datasets , 2011 .

[17]  Kang G. Shin,et al.  Efficient Memory Disaggregation with Infiniswap , 2017, NSDI.

[18]  Sameh Elnikety,et al.  PerfIso: Performance Isolation for Commercial Latency-Sensitive Services , 2018, USENIX Annual Technical Conference.

[19]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[20]  Timos K. Sellis,et al.  Optimizing ETL processes in data warehouses , 2005, 21st International Conference on Data Engineering (ICDE'05).

[21]  Carl A. Waldspurger,et al.  Memory resource management in VMware ESX server , 2002, OSDI '02.

[22]  Andrew Warfield,et al.  Decibel: Isolation and Sharing in Disaggregated Rack-Scale Storage , 2017, NSDI.

[23]  Jorge Bernardino,et al.  Real-time data warehouse loading methodology , 2008, IDEAS '08.

[24]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[25]  Ryan Stutsman,et al.  Memshare: a Dynamic Multi-tenant Key-value Cache , 2017, USENIX Annual Technical Conference.

[26]  Ali Ghodsi,et al.  FairRide: Near-Optimal, Fair Cache Sharing , 2016, NSDI.

[27]  Ion Stoica,et al.  Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure , 2019, NSDI.