A Smart Background Scheduler for Storage Systems

In today's enterprise storage systems, supported data services such as snapshot delete or drive rebuild can result in tremendous performance overhead if executed inline along with heavy foreground IO, often leading to missing Service Level Objectives (SLOs). Typical storage system applications such as Virtual Desktop Infrastructure (VDI) or web services follow a repetitive high/low workload pattern that can be learned and forecasted. We propose a priority-based background scheduler that learns this pattern and allows storage systems to maintain peak performance and meet service level objectives (SLOs) while supporting a number of data services. When foreground IO demand intensifies, system resources are dedicated to service foreground IO requests and any background processing that can be deferred are recorded to be processed in future idle cycles as long as our forecaster predicts that the storage pool has remaining capacity. The smart background scheduler adopts a resource partitioning model that allows both foreground and background IO to execute together as long as foreground IOs are not impacted, harnessing any free cycles to clear background debt. Using traces from VDI and web services applications, we show how our technique can out-perform a static policy that sets fixed limits on the deferred background debt and reduces SLO violations from 54.6% (when using a fixed background debt watermark), to only 6.2 % when dynamically adjusted by our smart background scheduler.

[1]  Guillaume Pierre,et al.  Wikipedia workload analysis for decentralized hosting , 2009, Comput. Networks.

[2]  Qi Zhang,et al.  Evaluating the Performability of Systems with Background Jobs , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[3]  Andreas Hotho,et al.  Time Series Forecasting for Self-Aware Systems , 2020, Proceedings of the IEEE.

[4]  Babak Ravandi,et al.  A Self-Learning Scheduling in Cloud Software Defined Block Storage , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).

[5]  Alma Riska,et al.  Storage Workload Isolation via Tier Warming: How Models Can Help , 2014, ICAC.

[6]  Joseph D. Touch,et al.  Idletime scheduling with preemption intervals , 2005, SOSP '05.

[7]  Paul Goodwin,et al.  The Holt-Winters Approach to Exponential Smoothing: 50 Years Old and Going Strong , 2010 .

[8]  Bradley W. Settlemyer,et al.  Building Reliable High-Performance Storage Systems: An Empirical and Analytical Study , 2019, 2019 IEEE International Conference on Cluster Computing (CLUSTER).

[9]  Nisha Talagala,et al.  Don't Stack Your Log On My Log , 2014, INFLOW.

[10]  Song Jiang,et al.  TotalCOW: Unleash the Power of Copy-On-Write for Thin-provisioned Containers , 2015, APSys.

[11]  Wei Hu,et al.  Scalability in the XFS File System , 1996, USENIX Annual Technical Conference.

[12]  Alma Riska,et al.  Scheduling data analytics work with performance guarantees: queuing and machine learning models in synergy , 2016, Cluster Computing.

[13]  Slawek Smyl,et al.  A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting , 2020, International Journal of Forecasting.

[14]  Alma Riska,et al.  Automated Storage Tiering Using Markov Chain Correlation Based Clustering , 2012, 2012 11th International Conference on Machine Learning and Applications.

[15]  Chris Chatfield,et al.  The Analysis of Time Series: An Introduction with R , 2019 .

[16]  Evgenia Smirni,et al.  ASIdE: Using Autocorrelation-Based Size Estimation for Scheduling Bursty Workloads , 2012, IEEE Transactions on Network and Service Management.

[17]  Alma Riska,et al.  Busy bee: how to use traffic information for better scheduling of background tasks , 2012, ICPE '12.

[18]  Alma Riska,et al.  Long-Range Dependence at the Disk Drive Level , 2006, Third International Conference on the Quantitative Evaluation of Systems - (QEST'06).

[19]  Baijian Yang,et al.  A Black-Box Self-Learning Scheduler for Cloud Block Storage Systems , 2016, 2016 IEEE 9th International Conference on Cloud Computing (CLOUD).

[20]  Arif Merchant,et al.  Projecting disk usage based on historical trends in a cloud environment , 2012, ScienceCloud '12.