2TL: A Scheduling Algorithm for Meeting the Latency Requirements of Bursty I/O Streams at User-Specified Percentiles

In a cloud data center, it is common for a storage system to be shared by front-end, user-interacting applications and back-end, data-intensive applications running on different virtual machines (VMs). Although it is necessary to meet the latency requirements of I/O streams generated by the VMs that execute the front-end applications, this can be difficult because: (1) often their latency requirements are specified at percentiles and (2) some of these streams issue requests in bursts. This paper proposes 2TL, a scheduling algorithm designed to meet the latency requirements of these applications. To meet latency requirements at user-specified percentiles, 2TL continuously controls the number of requests that expire before being serviced. To handle request bursts, it proactively adjusts scheduling parameters to avoid violations to latency requirements. We evaluated 2TL on a simulated RAID storage system using workloads that consist of concurrent I/O streams that cover a wide range of access characteristics, including burstiness. In this evaluation, latency requirements were specified at various percentiles found in the literature. When the storage system was sufficiently provisioned, it met the latency requirements of each workload without degrading storage system performance.

[1]  Jian Xu,et al.  Performance virtualization for large-scale storage systems , 2003, 22nd International Symposium on Reliable Distributed Systems, 2003. Proceedings..

[2]  Arif Merchant,et al.  Façade: Virtual Storage Devices with Performance Guarantees , 2003, FAST.

[3]  Anand Sivasubramaniam,et al.  Storage Performance Virtualization via Throughput and Latency Control , 2005, MASCOTS.

[4]  Song Jiang,et al.  QoS support for end users of I/O-intensive applications using shared storage systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[5]  Komal Shringare,et al.  Apache Hadoop Goes Realtime at Facebook , 2015 .

[6]  Peter J. Varman,et al.  pClock: an arrival curve based approach for QoS guarantees in shared storage systems , 2007, SIGMETRICS '07.

[7]  Dan Feng,et al.  PSLO: enforcing the Xth percentile latency and throughput SLOs for consolidated VM storage , 2016, EuroSys.

[8]  Arif Merchant,et al.  Minerva: An automated resource provisioning tool for large-scale storage systems , 2001, TOCS.

[9]  Kang G. Shin,et al.  Maestro: quality-of-service in large disk arrays , 2011, ICAC '11.

[10]  Yipkei Kwok 2TL: A raid I/O scheduling algorithm for simultaneously providing latency and throughput guarantees , 2014 .

[11]  Gang Peng,et al.  Multi-dimensional storage virtualization , 2004, SIGMETRICS '04/Performance '04.

[12]  Jean-Yves Le Boudec,et al.  Network Calculus: A Theory of Deterministic Queuing Systems for the Internet , 2001 .

[13]  Akshat Verma,et al.  Automated planners for storage provisioning and disaster recovery , 2008, IBM J. Res. Dev..

[14]  C. Moallemi,et al.  The Cost of Latency ∗ , 2009 .

[15]  Xiaoyun Zhu,et al.  Triage: performance isolation and differentiation for storage systems , 2004, Twelfth IEEE International Workshop on Quality of Service, 2004. IWQOS 2004..

[16]  Seetharami R. Seelam,et al.  FAIRIO: An Algorithm for Differentiated I/O Performance , 2011, 2011 23rd International Symposium on Computer Architecture and High Performance Computing.

[17]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[18]  Gregory R. Ganger,et al.  The DiskSim Simulation Environment Version 4.0 Reference Manual (CMU-PDL-08-101) , 1998 .

[19]  Seetharami R. Seelam,et al.  FAIRIO: A Throughput-oriented Algorithm for Differentiated I/O Performance , 2012, International Journal of Parallel Programming.

[20]  David Hung-Chang Du,et al.  QoS provisioning framework for an OSD-based storage system , 2005, 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05).

[21]  Quan Zhang,et al.  Courier: Multi-dimensional QoS guarantees for the consolidated storage system , 2014, Future Gener. Comput. Syst..

[22]  Mor Harchol-Balter,et al.  SNC-Meister: Admitting More Tenants with Tail Latency SLOs , 2016, SoCC.

[23]  Mahmut T. Kandemir,et al.  Provisioning a Multi-tiered Data Staging Area for Extreme-Scale Machines , 2011, 2011 31st International Conference on Distributed Computing Systems.

[24]  Mor Harchol-Balter,et al.  PriorityMeister: Tail Latency QoS for Shared Networked Storage , 2014, SoCC.

[25]  Randy H. Katz,et al.  Cake: enabling high-level SLOs on shared storage systems , 2012, SoCC '12.