Memory Bound vs . Compute Bound : A Quantitative Study of Cache and Memory Bandwidth in High Performance Applications

High performance applications depend on high utilizations of bandwidth and computing resources. They are most often limited by either memory or compute speed. Memory bound applications push the limits of the system bandwidth, while compute bound applications push the compute capabilities of the processor. Hierarchical caches are standard components of modern processors, designed to increase memory bandwidth and decrease average latency, particularly when successive memory accesses are spatially local. This paper presents and analyzes measured bandwidths for various levels of CPU cache and varying numbers of processor cores. We have modified the STREAM benchmark[1] to include a more accurate timer and greater multicore capabilty in order to accurately measure bandwidth over a range of memory sizes that span from the L1 cache to main memory. While the STREAM benchmark is designed to test peak bandwidth for sequential memory access, this paper extends it to analyze the effects on bandwidth caused by random memory access and increased computational intensity, two common instances in High Performance Computing.