Resilient workload manager: taming bursty workload of scaling internet applications

In data centers hosting scaling Internet applications, operators face the tradeoff dilemma between resource efficiency and Quality of Service (QoS), and the root cause lies in workload dynamics. In this paper, we address the problem with the design of Resilient Workload Manager (ROM). ROM explicitly segregates base workload and trespassing workload, the two naturally different components in application workload, and manages them separately in two resource zones with specialized optimization techniques. As a comprehensive workload management framework, ROM covers workload, data, resource, and Quality of Service of the target applications. It features a fast workload factoring algorithm for distributing incoming application requests, not only on volume but also on content, between the two resource zones; integrated two-dimensional workload shaping, resource planning, and request dispatching schemes for efficient utilization of base workload zone resource; and a simple and high-performance system architecture for dynamic provisioning in trespassing workload zone. Through extensive evaluation, we showed ROM can achieve resource efficiency (e.g., 54.9% server saving) guarantee QoS (based on client-side perceived service quality), reduce data access overhead in the trespassing workload zone during peak load (up to two orders of magnitude), and be adaptive at processing speed (running faster at peak load periods than at regular periods).

[1]  J. Turner,et al.  New directions in communications (or which way to the information age?) , 1986, IEEE Communications Magazine.

[2]  V. Paxson,et al.  Wide-area traffic: the failure of Poisson modeling , 1994, SIGCOMM.

[3]  Vipin Kumar,et al.  Multilevel k-way hypergraph partitioning , 1999, DAC '99.

[4]  Ness B. Shroff,et al.  Admission control for statistical QoS: theory and practice , 1999, IEEE Netw..

[5]  Ward Whitt,et al.  Partitioning Customers Into Service Groups , 1999 .

[6]  Mor Harchol-Balter,et al.  On Choosing a Task Assignment Policy for a Distributed Server System , 1998, J. Parallel Distributed Comput..

[7]  Y. Ebihara Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[8]  Konstantinos Psounis,et al.  CHOKe - a stateless active queue management scheme for approximating fair bandwidth allocation , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[9]  G. Voelker,et al.  On the scale and performance of cooperative Web proxy caching , 2000, OPSR.

[10]  Nick G. Duffield,et al.  Trajectory sampling for direct traffic observation , 2001, TNET.

[11]  Azer Bestavros,et al.  GreedyDual* Web caching algorithm: exploiting the two sources of temporal locality in Web request streams , 2001, Comput. Commun..

[12]  Mark S. Squillante,et al.  On maximizing service-level-agreement profits , 2001, PERV.

[13]  Ludmila Cherkasova,et al.  Measuring the capacity of a streaming media server in a Utility Data Center environment , 2002, MULTIMEDIA '02.

[14]  Dirk Abendroth,et al.  Intelligent shaping: well shaped throughout the entire network? , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[15]  Tao Yang,et al.  Integrated resource management for cluster-based Internet services , 2002, OSDI.

[16]  Maurizio Portolani,et al.  Data Center Fundamentals , 2003 .

[17]  Ricardo Bianchini,et al.  Dynamic cluster reconfiguration for power and performance , 2003 .

[18]  Lei Gao,et al.  Application specific data replication for edge services , 2003, WWW '03.

[19]  Abhishek Kumar,et al.  Data streaming algorithms for efficient and accurate estimation of flow size distribution , 2004, SIGMETRICS '04/Performance '04.

[20]  Ludmila Cherkasova,et al.  An SLA-oriented capacity planning tool for streaming media services , 2004, International Conference on Dependable Systems and Networks, 2004.

[21]  Michele Colajanni,et al.  Content-Aware Dispatching Algorithms for Cluster-Based Web Servers , 2004, Cluster Computing.

[22]  Christopher Olston,et al.  Finding (recently) frequent items in distributed data streams , 2005, 21st International Conference on Data Engineering (ICDE'05).

[23]  Richard Wolski,et al.  Quorum: flexible quality of service for internet services , 2005, NSDI.

[24]  Prashant J. Shenoy,et al.  Dynamic Provisioning of Multi-tier Internet Applications , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[25]  Fang Hao,et al.  Fast payload-based flow estimation for traffic monitoring and network security , 2005, 2005 Symposium on Architectures for Networking and Communications Systems (ANCS).

[26]  Dejan S. Milojicic,et al.  SLA Decomposition: Translating Service Level Objectives to System Level Thresholds , 2007, Fourth International Conference on Autonomic Computing (ICAC'07).

[27]  Qi Zhang,et al.  A Capacity Planning Framework for Multi-tier Enterprise Services with Real Workloads , 2007, 2007 10th IFIP/IEEE International Symposium on Integrated Network Management.

[28]  Prashant J. Shenoy,et al.  Cataclysm: Scalable overload policing for internet applications , 2008, J. Netw. Comput. Appl..

[29]  Jerome A. Rolia,et al.  An integrated approach to resource pool management: Policies, efficiency and quality metrics , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[30]  Haifeng Chen,et al.  Understanding internet video sharing site workload: a view from data center design , 2008, WWW.

[31]  David A. Patterson,et al.  A Case For Adaptive Datacenters To Conserve Energy and Improve Reliability , 2008 .

[32]  Haifeng Chen,et al.  Understanding Internet Video sharing site workload: A view from data center design , 2010, J. Vis. Commun. Image Represent..