Janus: Optimal Flash Provisioning for Cloud Storage Workloads

Janus is a system for partitioning the flash storage tier between workloads in a cloud-scale distributed file system with two tiers, flash storage and disk. The file system stores newly created files in the flash tier and moves them to the disk tier using either a First-In-First-Out (FIFO) policy or a Least-Recently-Used (LRU) policy, subject to per-workload allocations. Janus constructs compact metrics of the cacheability of the different workloads, using sampled distributed traces because of the large scale of the system. From these metrics, we formulate and solve an optimization problem to determine the flash allocation to workloads that maximizes the total reads sent to the flash tier, subject to operator-set priorities and bounds on flash write rates. Using measurements from production workloads in multiple data centers using these recommendations, as well as traces of other production workloads, we show that the resulting allocation improves the flash hit rate by 47-76% compared to a unified tier shared by all workloads. Based on these results and an analysis of several thousand production workloads, we conclude that flash storage is a cost-effective complement to disks in data centers.

[1]  W. Jencks Evolution on fast-forward , 1992, Nature.

[2]  GhemawatSanjay,et al.  The Google file system , 2003 .

[3]  Antony I. T. Rowstron,et al.  Migrating server storage to SSDs: analysis of tradeoffs , 2009, EuroSys '09.

[4]  Marcos K. Aguilera,et al.  Improving Recoverability in Multi-tier Storage Systems , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[5]  Matthew Addison Blaze Caching in large-scale distributed file systems , 1993 .

[6]  Arif Merchant,et al.  Uncertainty in Aggregate Estimates from Sampled Distributed Traces , 2012, MAD.

[7]  Jim Zelenka,et al.  Informed prefetching and caching , 1995, SOSP.

[8]  Charles Z. Loboz,et al.  Cloud resource usage: extreme distributions invalidating traditional capacity planning models , 2011, ScienceCloud '11.

[9]  Arif Merchant,et al.  Minerva: An automated resource provisioning tool for large-scale storage systems , 2001, TOCS.

[10]  Jongmoo Choi,et al.  Caching less for better performance: balancing cache size and update cost of flash memory cache in hybrid storage systems , 2012, FAST.

[11]  Arif Merchant,et al.  Projecting disk usage based on historical trends in a cloud environment , 2012, ScienceCloud '12.

[12]  John A. Kunze,et al.  A trace-driven analysis of the UNIX 4.2 BSD file system , 1985, SOSP '85.

[13]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[14]  Eric Anderson,et al.  Quickly finding near-optimal storage designs , 2005, TOCS.

[15]  Darrell D. E. Long,et al.  Design and Implementation of a Predictive File Prefetching Algorithm , 2001, USENIX Annual Technical Conference, General Track.

[16]  Mary Baker,et al.  Measurements of a distributed file system , 1991, SOSP '91.

[17]  Himabindu Pucha,et al.  Cost Effective Storage using Extent Based Dynamic Tiering , 2011, FAST.

[18]  Sean Quinlan,et al.  GFS: evolution on fast-forward , 2010, Commun. ACM.

[19]  Jim Gray,et al.  The 5 minute rule for trading memory for disc accesses and the 10 byte rule for trading memory for CPU time , 1987, SIGMOD '87.

[20]  Donald Beaver,et al.  Dapper, a Large-Scale Distributed Systems Tracing Infrastructure , 2010 .