A performance methodology for commercial servers

This paper discusses a methodology for analyzing and optimizing the performance of commercial servers. Commercial server workloads are shown to have unique characteristics which expand the elements that must be optimized to achieve good performance and require a unique performance methodology. The steps in the process of server performance optimization are described and include the following: 1. Selection of representative commercial workloads and identification of key characteristics to be evaluated. 2. Collection of performance data. Various instrumentation techniques are discussed in light of the requirements placed by commercial server workloads on the instrumentation. 3. Creation of input data for performance models on the basis of measured workload information. This step in the methodology must overcome the operating environment differences between the instance of the measured system under test and the target system design to be modeled. 4. Creation of performance models. Two general types are described: high-level models and detailed cycle-accurate simulators. These types are applied to model the processor, memory, and I/O system. 5. System performance optimization. The tuning of the operating system and application software is described. Optimization of performance among commercial applications is not simply an exercise in using traces to maximize the processor MIPS. Equally significant are items such as the use of probabilities to reflect future workload characteristics, software tuning, cache miss rate optimization, memory management, and I/O performance. The paper presents techniques for evaluating the performance of each of these key contributors so as to optimize the overall performance and cost/performance of commercial servers.

[1]  James R. Larus,et al.  Efficient program tracing , 1993, Computer.

[2]  Mary K. Vernon,et al.  An accurate and efficient performance analysis technique for multiprocessor snooping cache-consistency protocols , 1988, ISCA '88.

[3]  David R. Kaeli,et al.  Temporal-based procedure reordering for improved instruction cache performance , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[4]  John H. Edmondson,et al.  Performance Simulation of an Alpha Microprocessor , 1998, Computer.

[5]  Pradip Bose,et al.  Performance Analysis and Its Impact on Design , 1998, Computer.

[6]  Stephen S. Lavenberg,et al.  Stationary state probabilities at arrival instants for closed queueing networks with multiple types of customers , 1980, Journal of Applied Probability.

[7]  Ann Marie Grizzaffi Maynard,et al.  Contrasting characteristics and cache performance of technical and multi-user commercial workloads , 1994, ASPLOS VI.

[8]  Luiz André Barroso,et al.  Memory system characterization of commercial workloads , 1998, ISCA.

[9]  Ramendra K. Sahoo,et al.  MemorIES: a programmable, real-time hardware emulation tool for multiprocessor server design , 2000, SIGP.

[10]  Steven R. Kunkel,et al.  System optimization for OLTP workloads , 1999, IEEE Micro.

[11]  Alan Mink,et al.  Multiprocessor performance-measurement instrumentation , 1990, Computer.

[12]  W. Kent Fuchs,et al.  Address tracing for parallel machines , 1991, Computer.

[13]  Erik Hagersten,et al.  Queue locks on cache coherent multiprocessors , 1994, Proceedings of 8th International Parallel Processing Symposium.

[14]  Michel Dubois,et al.  RPM: A Rapid Prototyping Engine for Multiprocessor Systems , 1995, Computer.

[15]  Randy H. Katz,et al.  Introduction to redundant arrays of inexpensive disks (RAID) , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[16]  Steven R. Kunkel,et al.  A multithreaded PowerPC processor for commercial servers , 2000, IBM J. Res. Dev..

[17]  Yale N. Patt,et al.  System-oriented evaluation of I/O subsystem performance , 1995 .

[18]  Mike Johnson,et al.  Superscalar microprocessor design , 1991, Prentice Hall series in innovative technology.

[19]  David A. Patterson,et al.  Performance characterization of a Quad Pentium Pro SMP using OLTP workloads , 1998, ISCA.

[20]  Mayan Moudgill,et al.  Environment for PowerPC microarchitecture exploration , 1999, IEEE Micro.

[21]  Allen D. Malony,et al.  Performance Measurement Intrusion and Perturbation Analysis , 1992, IEEE Trans. Parallel Distributed Syst..

[22]  Mark Horowitz,et al.  ATUM: a new technique for capturing address traces using microcode , 1986, ISCA '86.

[23]  Bilha Mendelson,et al.  Profile-Directed Restructuring of Operating System Code , 1998, IBM Syst. J..

[24]  Gurindar S. Sohi,et al.  Experience with mean value analysis model for evaluating shared bus, throughput-oriented multiprocessors , 1991, SIGMETRICS '91.

[25]  K. Mani Chandy,et al.  Parametric Analysis of Queuing Networks , 1975, IBM J. Res. Dev..