Triage: performance isolation and differentiation for storage systems

Ensuring performance isolation and differentiation among workloads that share a storage infrastructure is a basic requirement in consolidated data centers. Existing management tools rely on resource provisioning to meet performance goals; they require detailed knowledge of the system characteristics and the workloads. Provisioning is inherently slow to react to system and workload dynamics, and in the general case, it is impossible to provision for the worst case. We propose a software-only solution that ensures predictable performance for storage access. It is applicable to a wide range of storage systems and makes no assumptions about workload characteristics. We use an on-line feedback loop with an adaptive controller that throttles storage access requests to ensure that the available system throughput is shared among workloads according to their performance goals and their relative importance. The controller considers the system as a "black box" and adapts automatically to system and workload changes. The controller is distributed to ensure high availability under overload conditions, and it can be used for both block and file access protocols. The evaluation of Triage, our experimental prototype, demonstrates workload isolation and differentiation, in an overloaded cluster file-system where workloads and system components are changing.

[1]  Prashant J. Shenoy,et al.  A practical learning-based approach for dynamic storage bandwidth allocation , 2003, IWQoS'03.

[2]  Kang-Won Lee,et al.  Scalable service differentiation in a shared storage cache , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[3]  Jian Xu,et al.  Performance virtualization for large-scale storage systems , 2003, 22nd International Symposium on Reliable Distributed Systems, 2003. Proceedings..

[4]  Kang G. Shin,et al.  User-Level QoS-Adaptive Resource Management in Server End-Systems , 2003, IEEE Trans. Computers.

[5]  Arif Merchant,et al.  FAB: Enterprise Storage Systems on a Shoestring , 2003, HotOS.

[6]  Yixin Diao,et al.  Optimizing Quality of Service Using Fuzzy Control , 2002, DSOM.

[7]  Graham C. Goodwin,et al.  Adaptive filtering prediction and control , 1984 .

[8]  Klara Nahrstedt,et al.  A control theoretical model for quality of service adaptations , 1998, 1998 Sixth International Workshop on Quality of Service (IWQoS'98) (Cat. No.98EX136).

[9]  Sang Hyuk Son,et al.  Feedback Control Real-Time Scheduling in Distributed Real-Time Systems , 2001, RTSS 2001.

[10]  Sang Hyuk Son,et al.  A feedback control approach for guaranteeing relative delays in Web servers , 2001, Proceedings Seventh IEEE Real-Time Technology and Applications Symposium.

[11]  Chanik Park,et al.  Regulating I/O Performance of Shared Storage with a Control Theoretical Approach , 2004, MSST.

[12]  Dharmendra S. Modha,et al.  CacheCOW: providing QoS for storage system caches , 2003, SIGMETRICS '03.

[13]  Joseph L. Hellerstein,et al.  Using Control Theory to Achieve Service Level Objectives In Performance Management , 2001, 2001 IEEE/IFIP International Symposium on Integrated Network Management Proceedings. Integrated Network Management VII. Integrated Management Strategies for the New Millennium (Cat. No.01EX470).

[14]  Karl Johan Åström,et al.  Adaptive Control , 1989, Embedded Digital Control with Microcontrollers.

[15]  Gene F. Franklin,et al.  Digital control of dynamic systems , 1980 .

[16]  Lui Sha,et al.  Online response time optimization of Apache web server , 2003, IWQoS'03.

[17]  K. Shin,et al.  Performance Guarantees for Web Server End-Systems: A Control-Theoretical Approach , 2002, IEEE Trans. Parallel Distributed Syst..

[18]  A. Robertson,et al.  Analysis and design of admission control in Web-server systems , 2003, Proceedings of the 2003 American Control Conference, 2003..

[19]  Eric Anderson,et al.  Proceedings of the Fast 2002 Conference on File and Storage Technologies Hippodrome: Running Circles around Storage Administration , 2022 .

[20]  Tarek F. Abdelzaher,et al.  Web Content Adaptation to Improve Server Overload Behavior , 1999, Comput. Networks.

[21]  Chenyang Lu,et al.  Proceedings of the Fast 2002 Conference on File and Storage Technologies Aqueduct: Online Data Migration with Performance Guarantees , 2022 .

[22]  Chenyang Lu,et al.  An adaptive control framework for QoS guarantees and its application to differentiated caching , 2002, IEEE 2002 Tenth IEEE International Workshop on Quality of Service (Cat. No.02EX564).

[23]  S. Parekh,et al.  MIMO control of an Apache web server: modeling and controller design , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).

[24]  Arif Merchant,et al.  Façade: Virtual Storage Devices with Performance Guarantees , 2003, FAST.

[25]  Joseph L. Hellerstein,et al.  Using Control Theory to Achieve Service Level Objectives In Performance Management , 2002, Real-Time Systems.

[26]  Yixin Diao,et al.  Managing Web server performance with AutoTune agents , 2003, IBM Syst. J..

[27]  Yixin Diao,et al.  Using fuzzy control to maximize profits in service level management , 2002, IBM Syst. J..

[28]  Tarek F. Abdelzaher,et al.  Differentiated caching services; a control-theoretical approach , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[29]  David E. Culler,et al.  USENIX Association Proceedings of USITS ’ 03 : 4 th USENIX Symposium on Internet Technologies and Systems , 2003 .

[30]  Dharmendra S. Modha,et al.  CacheCOW: QoS for storage system caches , 2003, IWQoS'03.

[31]  Chenyang Lu,et al.  End-to-end utilization control in distributed real-time systems , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..