Centaur: Host-Side SSD Caching for Storage Performance Control

Host-side SSD caches represent a powerful knob for improving and controlling storage performance and improve performance isolation. We present Centaur, as a host-side SSD caching solution that uses cache sizing as a control knob to achieve storage performance goals. Centaur implements dynamically partitioned per-VM caches with per-partition local replacement to provide both lower cache miss rate, better performance isolation and performance control for VM workloads. It uses SSD cache sizing as a universal knob for meeting a variety of workload-specific goals including per-VM latency and IOPS reservations, proportional share fairness, and aggregate optimizations such as minimizing the average latency across VMs. We implemented Centaur for the VMware ESX hyper visor. With Centaur, times for simultaneously booting 28 virtual desktops improve by 42% relative to a non-caching system and by 18% relative to a unified caching system. Centaur also implements per-VM shares for latency with less than 5% error when running micro benchmarks, and enforces latency and IOPS reservations on OLTP workloads with less than 10% error.

[1]  Peter J. Varman,et al.  mClock: Handling Throughput Variability for Hypervisor IO Scheduling , 2010, OSDI.

[2]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[3]  Himabindu Pucha,et al.  Cost Effective Storage using Extent Based Dynamic Tiering , 2011, FAST.

[4]  Ming Zhao,et al.  Write policies for host-side flash caches , 2013, FAST.

[5]  Tal Garfinkel,et al.  The Design and Evolution of Live Storage Migration in VMware ESX , 2011, USENIX Annual Technical Conference.

[6]  Tal Garfinkel,et al.  XvMotion: Unified Virtual Machine Migration over Long Distance , 2014, USENIX Annual Technical Conference.

[7]  Sang Lyul Min,et al.  A low-overhead high-performance unified buffer management scheme that exploits sequential and looping references , 2000, OSDI.

[8]  Hjörtur Björnsson,et al.  Dynamic performance profiling of cloud caches , 2013, SoCC.

[9]  Fei Meng,et al.  vCacheShare: Automated Server Flash Cache Space Management in a Virtualization Environment , 2014, USENIX Annual Technical Conference.

[10]  Dharmendra S. Modha,et al.  CacheCOW: QoS for storage system caches , 2003, IWQoS'03.

[11]  J. T. Robinson,et al.  Data cache management using frequency-based replacement , 1990, SIGMETRICS '90.

[12]  Akshat Verma,et al.  SRCMap: Energy Proportional Storage Using Dynamic Consolidation , 2010, FAST.

[13]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[14]  Mahmut T. Kandemir,et al.  Dynamic storage cache allocation in multi-server architectures , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[15]  Jinpeng Wei,et al.  Software Persistent Memory , 2012, USENIX Annual Technical Conference.

[16]  G. Edward Suh,et al.  Dynamic Partitioning of Shared Cache Memory , 2004, The Journal of Supercomputing.

[17]  John Turek,et al.  Optimal Partitioning of Cache Memory , 1992, IEEE Trans. Computers.

[18]  Irfan Ahmad,et al.  Pesto: online storage performance management in virtualized datacenters , 2011, SoCC.

[19]  Gianfranco Bilardi,et al.  Efficient stack distance computation for priority replacement policies , 2011, CF '11.

[20]  Dharmendra S. Modha,et al.  CacheCOW: providing QoS for storage system caches , 2003, SIGMETRICS '03.

[21]  Kaladhar Voruganti,et al.  SLO-aware hybrid store , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[22]  Peter J. Denning The locality principle , 2005, Commun. ACM.

[23]  Nimrod Megiddo,et al.  Outperforming LRU with an adaptive replacement cache algorithm , 2004, Computer.

[24]  Jin Chen,et al.  Dynamic Resource Allocation for Database Servers Running on Virtual Storage , 2009, FAST.

[25]  Akshat Verma,et al.  Generalized ERSS tree model: Revisiting working sets , 2010, Perform. Evaluation.

[26]  Peter J. Denning,et al.  Working Sets Past and Present , 1980, IEEE Transactions on Software Engineering.

[27]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[28]  Sanjeev Kumar,et al.  Dynamic tracking of page miss ratio curve for memory management , 2004, ASPLOS XI.

[29]  Siyuan Ma,et al.  S-CAVE: Effective SSD caching to improve virtual machine storage performance , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[30]  Daniel P. Siewiorek,et al.  Practical solutions for QoS-based resource allocation problems , 1998, Proceedings 19th IEEE Real-Time Systems Symposium (Cat. No.98CB36279).

[31]  Irfan Ahmad,et al.  BASIL: Automated IO Load Balancing Across Storage Devices , 2010, FAST.

[32]  Steve Byan,et al.  Mercury: Host-side flash caching for the data center , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[33]  Angela Demke Brown,et al.  Reliable Writeback for Client-side Flash Caches , 2014, USENIX Annual Technical Conference.

[34]  Irfan Ahmad,et al.  Efficient MRC Construction with SHARDS , 2015, FAST.

[35]  Irfan Ahmad,et al.  PARDA: Proportional Allocation of Resources for Distributed Storage Access , 2009, FAST.

[36]  Harold S. Stone,et al.  Improving Disk Cache Hit-Ratios Through Cache Partitioning , 1992, IEEE Trans. Computers.

[37]  Margo I. Seltzer,et al.  Flash Caching on the Storage Client , 2013, USENIX Annual Technical Conference.

[38]  Andrew Warfield,et al.  Characterizing Storage Workloads with Counter Stacks , 2014, OSDI.

[39]  Jason Liu,et al.  To ARC or Not to ARC , 2015, HotStorage.

[40]  Vagelis Hristidis,et al.  BORG: Block-reORGanization for Self-optimizing Storage Systems , 2009, FAST.