Workload characterization for the design of future servers

Workload characterization has become an integral part of the design of future servers since their characteristics can guide the developers to understand the workload requirements and how the underlying architecture would optimize the performance of the intended workload. In this paper, we give an overview of the POWER5 architecture. We also introduce the POWER5 performance monitor facilities and performance events that lead to the construction of a CPI (cycles per instruction) breakdown model. For our study, we characterize four different groups of workloads: commercial, HPC, memory, and scientific. Using the data obtained from the POWER5 performance counters, we breakdown the CPI stack into a base component, when the processor is completing work and a stall component when the processor is not completing instructions. The stall component can be further divided into cycles when the pipeline was empty and cycles when the pipeline was not empty but completion is stalled. With this model, we enumerate the number of processing cycles, i.e., a fraction of the CPI, a workload spent while progressing through the core resources and the incurred penalty upon encountering those resource usage inhibitors. The results show the CPI breakdown for each workload, identify where each workload spends its processing cycles and the associated CPI cost when accessing the core resources.