Pthreads Performance Characteristics on Shared Cache CMP, Private Cache CMP and SMP

With the wide availability of chip multi-processing (CMP), software developers are now facing the task of effectively parallelizing their software code. Once they have identified the areas of parallelization, they will need to know the level of code granularity needed to ensure profitable execution. Furthermore, this problem multiplies itself with different hardware available. In this paper, we present a novel approach for fair comparison of the hardware configuration by simulation through configuring a pair of quad-core processors. The simulated configuration represents shared cache CMP, private cache CMP and symmetrical multiprocessor (SMP) environment. We then present a modified lmbench micro-benchmark suite to measure the cost of threading on these different hardware configurations. In our empirical studies, we observe that shared cache CMP exhibits better performance when the operating systems load balancer is highly active. However, the measurements also indicate that thread size is an important consideration where potential cache trashing can occur when sharing a cache between processing cores. Private cache CMP and SMP do not exhibit significant difference in our measurements. The techniques presented can be incorporated into integrated development environment, compilers and potentially even other run-time environments.