Improving Energy Efficiency through Parallelization and Vectorization on Intel Core i5 and i7 Processors

Driven by the utilization wall and the Dark Silicon effect, energy efficiency has become a key research area in microprocessor design. Vectorization, parallelization, specialization and heterogeneity are the key design points to deal with the utilization wall. Heterogeneous architectures are enhanced with architectural optimizations, such as vectorization, to further increase the energy efficiency of the processor, reducing the number of instructions that go through the pipeline and leveraging the usage of the memory hierarchy. AMD® FusionTM or Intel Core i5 and i7 are commercial examples of this new generation of microprocessors. Still, there is a question to be answered: How can software developers maximize energy efficiency of these architectures? In this paper, we evaluate the energy efficiency of different processors from the Intel Core i5 and i7 family, using selected benchmarks from the PARSEC suite with variable core counts and vectorization techniques to quantify energy efficiency under the Thermal Design Power (TDP). Results show that software developers should prioritize vectorization over parallelization whenever possible, as it is much better in terms of energy efficiency. When using vectorization and parallelization simultaneously, scalability of the application can be reduced drastically, and may require different development strategies to maximize resource utilization in order to increase energy efficiency. This is especially true in the server market, where we can find more than one processor per board. Finally, when comparing on-chip and “at the wall” energy savings, we can see variations from 5 to 20%, depending on the benchmark and system. This high variability shows the need to develop a more detailed model to predict system power based on on-chip power information.

[1]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[2]  Margaret Martonosi,et al.  Computer Architecture Techniques for Power-Efficiency , 2008, Computer Architecture Techniques for Power-Efficiency.

[3]  Christoforos E. Kozyrakis,et al.  Models and Metrics to Enable Energy-Efficiency Optimizations , 2007, Computer.

[4]  Jian Li,et al.  Power-performance considerations of parallel computing on chip multiprocessors , 2005, TACO.

[5]  Luca Benini,et al.  Dynamic voltage scaling and power management for portable systems , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[6]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[7]  Christopher J. Hughes,et al.  Architectural Support for Fine-Grained Parallelism on Multi-core Architectures 217 Architectural Support for Fine-Grained Parallelism on Multi-core Architectures , 2007 .

[8]  Shreesha Srinath,et al.  Accelerating a PARSEC Benchmark Using Portable Subword SIMD , 2011 .

[9]  Anantha P. Chandrakasan,et al.  Low-Power CMOS Design , 1997 .

[10]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[11]  Dong Li,et al.  PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications , 2010, IEEE Transactions on Parallel and Distributed Systems.

[12]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[13]  Mahmut T. Kandemir,et al.  Leakage Current: Moore's Law Meets Static Power , 2003, Computer.

[14]  Gurindar S. Sohi,et al.  A static power model for architects , 2000, MICRO 33.

[15]  Josep Torrellas,et al.  Comparing the power and performance of Intel's SCC to state-of-the-art CPUs and GPUs , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

[16]  Laksono Adhianto,et al.  HPCToolkit : Performance Measurement and Analysis for Supercomputers with Node-level Parallelism , 2008 .

[17]  M.J. Flynn,et al.  Microprocessor design issues: thoughts on the road ahead , 2005, IEEE Micro.

[18]  Enrique S. Quintana-Ortí,et al.  Optimization of power consumption in the iterative solution of sparse linear systems on graphics processors , 2011, Computer Science - Research and Development.

[19]  William J. Bowhill,et al.  Design of High-Performance Microprocessor Circuits , 2001 .

[20]  Pradip Bose,et al.  Microarchitectural techniques for power gating of execution units , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[21]  Margaret Martonosi,et al.  Voltage and frequency control with adaptive reaction time in multiple-clock-domain processors , 2005, 11th International Symposium on High-Performance Computer Architecture.

[22]  Pradeep Dubey,et al.  Closing the Ninja Performance Gap through Traditional Programming and Compiler Technology , 2012 .

[23]  Wolfgang E. Nagel,et al.  Flexible workload generation for HPC cluster efficiency benchmarking , 2012, Computer Science - Research and Development.