The impact of hyper-threading on processor resource utilization in production applications

Intel provides Hyper-Threading (HT) in processors based on its Pentium and Nehalem micro-architecture such as the Westmere-EP. HT enables two threads to execute on each core in order to hide latencies related to data access. These two threads can execute simultaneously, filling unused stages in the functional unit pipelines. To aid better understanding of HT-related issues, we collect Performance Monitoring Unit (PMU) data (instructions retired; unhalted core cycles; L2 and L3 cache hits and misses; vector and scalar floating-point operations, etc.). We then use the PMU data to calculate a new metric of efficiency in order to quantify processor resource utilization and make comparisons of that utilization between single-threading (ST) and HT modes. We also study performance gain using unhalted core cycles, code efficiency of using vector units of the processor, and the impact of HT mode on various shared resources like L2 and L3 cache. Results using four full-scale, production-quality scientific applications from computational fluid dynamics (CFD) used by NASA scientists indicate that HT generally improves processor resource utilization efficiency, but does not necessarily translate into overall application performance gain.

[1]  Perry Cheng,et al.  Myths and realities: the performance impact of garbage collection , 2004, SIGMETRICS '04/Performance '04.

[2]  Rupak Biswas,et al.  Performance Analysis of CFD Application Cart3D Using MPInside and Performance Monitor Unit Data on Nehalem and Westmere Based Supercomputers , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[3]  A. Quealy,et al.  National Combustion Code: Parallel Implementation and Performance , 2000 .

[4]  D. Marr,et al.  Hyper-Threading Technology Architecture and MIcroarchitecture , 2002 .

[5]  J. Morris Chang,et al.  Performance Characterization of Java Applications on SMT Processors , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[6]  Marsha Berger,et al.  High Resolution Aerospace Applications Using the NASA Columbia Supercomputer , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[7]  Rupak Biswas,et al.  Early performance evaluation of a "Nehalem" cluster using scientific and engineering applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.