A framework for profiling multiprocessor memory performance

Because of the increasing gap between processor frequency and dynamic random access memory (DRAM) speed, the performance of the memory subsystem typically governs that of the system as a whole. This is especially true for symmetric multiprocessor systems (SMPs). Therefore, performance evaluation methodologies that facilitate the analysis and optimization of the memory subsystem are essential. This paper, describes such a methodology, a performance evaluation framework, and demonstrates its power, speed, and flexibility in the context of a study of the TPC-C benchmark, executed on eight- and 32-processor IBM-pSeries 690 (p690) systems. The framework facilitates analysis of sampled performance monitor event traces that are collected in real time. The analyses are used to characterize the locality of reference exhibited by TPC-C data loads at the various levels of the memory hierarchy and evaluate the efficacy of design aspects of and policies associated with the p690 memory hierarchy w.r.t. workload demands.