An Energy Efficiency Feature Survey of the Intel Haswell Processor

The recently introduced Intel Xeon E5-1600 v3 and E5-2600 v3 series processors -- codenamed Haswell-EP -- implement major changes compared to their predecessors. Among these changes are integrated voltage regulators that enable individual voltages and frequencies for every core. In this paper we analyze a number of consequences of this development that are of utmost importance for energy efficiency optimization strategies such as dynamic voltage and frequency scaling (DVFS) and dynamic concurrency throttling (DCT). This includes the enhanced RAPL implementation and its improved accuracy as it moves from modeling to actual measurement. Another fundamental change is that every clock speed above AVX frequency -- including nominal frequency -- is opportunistic and unreliable, which vastly decreases performance predictability with potential effects on scalability. Moreover, we characterize significantly changed p-state transition behavior, and determine crucial memory performance data.

[1]  Wolfgang E. Nagel,et al.  Power measurement techniques on standard compute nodes: A quantitative comparison , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[2]  Martin Schulz,et al.  Beyond DVFS: A First Look at Performance under a Hardware-Enforced Power Bound , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[3]  Matthias S. Müller,et al.  Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[4]  Joseph Shor,et al.  A Fully Integrated Multi-CPU, Processor Graphics, and Memory Controller 32-nm Processor , 2012, IEEE Journal of Solid-State Circuits.

[5]  Wilfred Gomes,et al.  5.9 Haswell: A family of IA 22nm processors , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[6]  Michael Werner,et al.  Wake-up latencies for processor idle states on current x86 processors , 2014, Computer Science - Research and Development.

[7]  Rajesh Kumar,et al.  Haswell: A Family of IA 22 nm Processors , 2015, IEEE Journal of Solid-State Circuits.

[8]  Wei Chen,et al.  5.4 Ivytown: A 22nm 15-core enterprise Xeon® processor family , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[9]  William Jalby,et al.  Evaluation of CPU frequency transition latency , 2014, Computer Science - Research and Development.

[10]  Ravi Iyer,et al.  Cache QoS: From concept to reality in the Intel® Xeon® processor E5-2600 v3 product family , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[11]  Matthias S. Müller,et al.  Characterizing the energy consumption of data transfers and arithmetic operations on x86−64 processors , 2010, International Conference on Green Computing.

[12]  Bo Wang,et al.  A 1.2 TB/s on-chip ring interconnect for 45nm 8-core enterprise Xeon® processor , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[13]  Fabrice Paillet,et al.  FIVR — Fully integrated voltage regulators on 4th generation Intel® Core™ SoCs , 2014, 2014 IEEE Applied Power Electronics Conference and Exposition - APEC 2014.

[14]  Robert Schöne,et al.  Memory Performance at Reduced CPU Clock Speeds: An Analysis of Current x86_64 Processors , 2012, HotPower.

[15]  Gerhard Wellein,et al.  LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments , 2010, 2010 39th International Conference on Parallel Processing Workshops.

[16]  Robert Schöne,et al.  Introducing FIRESTARTER: A processor stress test utility , 2013, 2013 International Green Computing Conference Proceedings.