Operational Analysis of Parallel Servers

Multicore processors promise continued hardware performance improvements even as single-core performance flattens out. However they also enable increasingly complex application software that threatens to obfuscate application-level performance. This paper applies operational analysis to the problem of understanding and predicting application-level performance in parallel servers. We present operational laws that offer both insight and actionable information based on lightweight passive external observations of black-box applications. One law accurately infers queuing delays; others predict the performance implications of expanding or reducing capacity. The former enables improved monitoring and system management; the latter enable capacity planning and dynamic resource provisioning to incorporate application-level performance in a principled way. Our laws rest upon a handful of weak assumptions that are easy to test and widely satisfied in practice. We show that the laws are broadly applicable across many practical CPU scheduling policies. Experimental results on a multicore network server in an enterprise data center demonstrate the usefulness of our laws.

[1]  Edward D. Lazowska,et al.  Speedup Versus Efficiency in Parallel Systems , 1989, IEEE Trans. Computers.

[2]  Richard Mortier,et al.  Using Magpie for Request Extraction and Workload Modelling , 2004, OSDI.

[3]  Robert F. Sauers,et al.  HP-UX 11i Tuning and Performance (2nd Edition) , 2004 .

[4]  Nikolai Joukov,et al.  Operating system profiling via latency analysis , 2006, OSDI '06.

[5]  Terence Kelly,et al.  Detecting Performance Anomalies in Global Applications , 2005, WORLDS.

[6]  Daniel A. Menascé,et al.  Analytic performance models for single class and multiple class multithreaded software servers , 2006, Int. CMG Conference.

[7]  Goetz Graefe,et al.  The five-minute rule twenty years later, and how flash memory changes the rules , 2007, DaMoN '07.

[8]  Mor Harchol-Balter,et al.  Size-based scheduling to improve web performance , 2003, TOCS.

[9]  Adam Wierman,et al.  Open Versus Closed: A Cautionary Tale , 2006, NSDI.

[10]  Christopher Stewart,et al.  Operational analysis of processor speed scaling , 2008, SPAA '08.

[11]  Sandy Irani,et al.  Algorithmic problems in power management , 2005, SIGA.

[12]  Chen Ding,et al.  Quantifying the cost of context switch , 2007, ExpCS '07.

[13]  Shang Zhi,et al.  A proof of the queueing formula: L=λW , 2001 .

[14]  Asser N. Tantawi,et al.  An analytical model for multi-tier internet services and its applications , 2005, SIGMETRICS '05.

[15]  Lui Sha,et al.  Modeling 3-tiered Web applications , 2005, 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[16]  Christopher Stewart,et al.  Exploiting nonstationarity for performance prediction , 2007, EuroSys '07.

[17]  Daniel A. Menascé,et al.  Two-level iterative queuing modeling of software contention , 2002, Proceedings. 10th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunications Systems.

[18]  L. Goddard,et al.  Operations Research (OR) , 2007 .

[19]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[20]  Andy Oram,et al.  Understanding the Linux Kernel, Second Edition , 2002 .

[21]  Wei Jin,et al.  USENIX Association Proceedings of USITS ’ 03 : 4 th USENIX Symposium on Internet Technologies and Systems , 2003 .

[22]  Gregory R. Ganger,et al.  Ironmodel: robust performance models in the wild , 2008, SIGMETRICS '08.

[23]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[24]  J. Little A Proof for the Queuing Formula: L = λW , 1961 .

[25]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[26]  Peter J. Denning,et al.  The Operational Analysis of Queueing Network Models , 1978, CSUR.

[27]  Prashant J. Shenoy,et al.  Resource overbooking and application profiling in shared hosting platforms , 2002, OSDI '02.

[28]  Gunter Bolch,et al.  Queueing Networks and Markov Chains , 2005 .

[29]  GraefeGoetz The Five-Minute Rule 20 Years Later , 2008 .

[30]  Goetz Graefe,et al.  The Five-Minute Rule 20 Years Later: and How Flash Memory Changes the Rules , 2008, ACM Queue.

[31]  Larry L. Peterson,et al.  Using PlanetLab for network research: myths, realities, and best practices , 2005, OPSR.

[32]  Ahmed Amer,et al.  Adapting Predictions and Workloads for Power Management , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[33]  Edward D. Lazowska,et al.  Quantitative system performance - computer system analysis using queueing network models , 1983, Int. CMG Conference.

[34]  John Augustine,et al.  Optimal Power-Down Strategies , 2008, SIAM J. Comput..

[35]  Christopher Stewart,et al.  Performance modeling and system management for multi-component online services , 2005, NSDI.

[36]  M. Miyazawa The derivation of invariance relations in complex queueing systems with stationary inputs , 1983 .

[37]  Alon Naveh,et al.  Power and Thermal Management in the Intel Core Duo Processor , 2006 .

[38]  Jerome A. Rolia,et al.  The Method of Layers , 1995, IEEE Trans. Software Eng..

[39]  R. Iyer Datacenter-on-Chip Architectures : Tera-scale Opportunities and Challenges in Intel ' s Manufacturing Environment , 2007 .

[40]  Daniel A. Menascé,et al.  Scaling for E-Business: Technologies, Models, Performance, and Capacity Planning , 2000 .

[41]  Christopher Stewart,et al.  A Dollar from 15 Cents: Cross-Platform Management for Internet Services , 2008, USENIX Annual Technical Conference.

[42]  Ravi Iyer,et al.  DatacenteronChip Architectures Terascale Opportunities and Challenges , 2007 .

[43]  Daniel Pierre Bovet,et al.  Understanding the Linux Kernel , 2000 .