Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture

A major challenge in the design of contemporary microprocessors is the increasing number of cores in conjunction with the persevering need for cache coherence. To achieve this, the memory subsystem steadily gains complexity that has evolved to levels beyond comprehension of most application performance analysts. The Intel Has well-EP architecture is such an example. It includes considerable advancements regarding memory hierarchy, on-chip communication, and cache coherence mechanisms compared to the previous generation. We have developed sophisticated benchmarks that allow us to perform in-depth investigations with full memory location and coherence state control. Using these benchmarks we investigate performance data and architectural properties of the Has well-EP micro-architecture, including important memory latency and bandwidth characteristics as well as the cost of core-to-core transfers. This allows us to further the understanding of such complex designs by documenting implementation details the are either not publicly available at all, or only indirectly documented through patents.

[1]  Thomas Ilsche,et al.  An Energy Efficiency Feature Survey of the Intel Haswell Processor , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[2]  Matthias S. Müller,et al.  SPEC MPI2007—an application benchmark suite for parallel systems using MPI , 2010, Concurr. Comput. Pract. Exp..

[3]  Stéphan Jourdan,et al.  Haswell: The Fourth-Generation Intel Core Processor , 2014, IEEE Micro.

[4]  Wolfgang E. Nagel,et al.  Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  S MüllerMatthias,et al.  SPEC MPI2007an application benchmark suite for parallel systems using MPI , 2010 .

[6]  Joseph Shor,et al.  A Fully Integrated Multi-CPU, Processor Graphics, and Memory Controller 32-nm Processor , 2012, IEEE Journal of Solid-State Circuits.

[7]  Matthias S. Müller,et al.  Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[8]  Matthias S. Müller,et al.  SPEC OMP2012 - An Application Benchmark Suite for Parallel Systems Using OpenMP , 2012, IWOMP.

[9]  Pat Conway,et al.  The AMD Opteron Northbridge Architecture , 2007, IEEE Micro.