Online Capacity Identification of Multitier Websites Using Hardware Performance Counters

Understanding server capacity is crucial to system capacity planning, configuration, and QoS-aware resource management. Conventional stress testing approaches measure server capacity offline in terms of application-level performance metrics like response time and throughput. They are limited in measurement accuracy and timeliness. In a multitier website, resource bottleneck often shifts between tiers as client access pattern changes. This makes the problem of online capacity measurement even more challenge. This paper presents an online measurement approach based on low-level hardware performance metrics such as instructions execution rate and cache access behavior. Such metrics together define a system internal running state. The measurement approach uses machine learning techniques to infer application-level performance at each tier from a set of selected hardware performance counters. A coordinated predictor is induced over individual tier-wide models to make global system performance prediction and identify the bottleneck when the system becomes overloaded. Experiments were conducted on a two-tier Tomcat/MySQL-configured website using TPC-W benchmarks. Experimental results demonstrated that this approach was able to achieve an overload prediction accuracy of higher than 90 percent for a priori known input traffic mix and over 85 percent accuracy even for traffic causing frequent bottleneck shifting. It costs less than 0.5 percent runtime overhead for data collection and no more than 50 ms for each online decision making.

[1]  Robert J. Fowler,et al.  Using Performance Reflection in Systems Software , 2003, HotOS.

[2]  Xiao Zhang,et al.  Processor Hardware Counter Statistics as a First-Class System Resource , 2007, HotOS.

[3]  Cheng-Zhong Xu Scalable and Secure Internet Services and Architecture , 2005 .

[4]  Armando Fox,et al.  Capturing, indexing, clustering, and retrieving system history , 2005, SOSP '05.

[5]  Jeffrey C. Mogul,et al.  Emergent (mis)behavior vs. complex software systems , 2006, EuroSys.

[6]  Peter Druschel,et al.  Measuring the Capacity of a Web Server , 1997, USENIX Symposium on Internet Technologies and Systems.

[7]  Prasant Mohapatra,et al.  Session-based overload control in QoS-aware Web servers , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[8]  C MogulJeffrey Emergent (mis)behavior vs. complex software systems , 2006 .

[9]  Armando Fox,et al.  Ensembles of models for automated diagnosis of system performance problems , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[10]  Erich M. Nahum,et al.  A method for transparent admission control and request scheduling in e-commerce web sites , 2004, WWW '04.

[11]  Jeffrey S. Chase,et al.  Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control , 2004, OSDI.

[12]  Prasant Mohapatra,et al.  An Admission Control Scheme for Predictable Server Response Time for Web Accesses , 2001, WWW '01.

[13]  Ludmila Cherkasova,et al.  Measuring the capacity of a streaming media server in a Utility Data Center environment , 2002, MULTIMEDIA '02.

[14]  Ludmila Cherkasova,et al.  Session Based Admission Control: A Mechanism for Improving the Performance of an Overloaded Web Server , 1998 .

[15]  Hans-Ulrich Heiß,et al.  Adaptive Load Control in Transaction Processing Systems , 1991, VLDB.

[16]  Richard Wolski,et al.  Quorum: flexible quality of service for internet services , 2005, NSDI.

[17]  Dimitrios S. Nikolopoulos,et al.  Online power-performance adaptation of multithreaded programs using hardware event-based prediction , 2006, ICS '06.

[18]  Yixin Diao,et al.  Using MIMO feedback control to enforce policies for interrelated metrics with application to the Apache Web server , 2002, NOMS 2002. IEEE/IFIP Network Operations and Management Symposium. ' Management Solutions for the New Communications World'(Cat. No.02CH37327).

[19]  Michael Dahlin,et al.  Machine Learning for On-Line Hardware Reconfiguration , 2007, IJCAI.

[20]  Frank Bellosa,et al.  Energy Management for Hypervisor-Based Virtual Machines , 2007, USENIX Annual Technical Conference.

[21]  Xiao Zhang,et al.  Hardware counter driven on-the-fly request signatures , 2008, ASPLOS.

[22]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[23]  Krishna Kant,et al.  Overload Control Mechanisms for Web Servers , 2001 .

[24]  C. Amza,et al.  Specification and implementation of dynamic Web site benchmarks , 2002, 2002 IEEE International Workshop on Workload Characterization.

[25]  Cheng-Zhong Xu,et al.  Online Measurement of the Capacity of Multi-Tier Websites Using Hardware Performance Counters , 2008, 2008 The 28th International Conference on Distributed Computing Systems.

[26]  David E. Culler,et al.  USENIX Association Proceedings of USITS ’ 03 : 4 th USENIX Symposium on Internet Technologies and Systems , 2003 .

[27]  Xiliang Zhong,et al.  Energy-Aware Modeling and Scheduling for Dynamic Voltage Scaling with Statistical Real-Time Guarantee , 2007, IEEE Transactions on Computers.

[28]  Cheng-Zhong Xu,et al.  eQoS: Provisioning of Client-Perceived End-to-End QoS Guarantees in Web Servers , 2006, IEEE Transactions on Computers.

[29]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[30]  Shivnath Babu,et al.  Processing Forecasting Queries , 2007, VLDB.