Measuring and Exploiting Guardbands of Server-Grade ARMv8 CPU Cores and DRAMs

In this paper, we present the results of our comprehensive measurement study of the timing and voltage guardbands in memories and cores of a commodity ARMv8 based micro-server. Using various synthetic micro-benchmarks, we reveal how the adopted voltage margins vary among the 8 cores of the CPU chip, and among 3 different sigma chips and we show how prone they are to worst-case voltage noise. In addition, we characterize the variation of 'weak' DRAM cells in terms of their retention time across 72 DRAM chips and evaluate the error mitigation efficacy of the available error-correcting codes in case of operation under aggressively relaxed refresh periods. Finally, we show the overall energy savings that could be achieved by shaving the adopted guardbands in the cores and memories using various applications. Our characterization results show the potential to obtain up-to 38.8% energy savings in cores and up-to 27.3% within DRAMs.

[1]  Onur Mutlu,et al.  An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms , 2013, ISCA.

[2]  Dimitris Gizopoulos,et al.  Micro-Viruses for Fast System-Level Voltage Margins Characterization in Multicore CPUs , 2018, 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[3]  James Tschanz,et al.  Parameter variations and impact on circuits and microarchitecture , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[4]  Gaurav Singh,et al.  AppliedMicro X-Gene2 , 2014, IEEE Hot Chips Symposium.

[5]  Dimitrios S. Nikolopoulos,et al.  Relaxing DRAM refresh rate through access pattern scheduling: A case study on stencil-based algorithms , 2017, 2017 IEEE 23rd International Symposium on On-Line Testing and Robust System Design (IOLTS).

[6]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[7]  Shidhartha Das,et al.  14.6 An all-digital power-delivery monitor for analysis of a 28nm dual-core ARM Cortex-A57 cluster , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[8]  Hyungmin Cho,et al.  Cross-layer error resilience for robust systems , 2010, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[9]  Marco A. Antoniades,et al.  Sensing CPU Voltage Noise Through Electromagnetic Emanations , 2018, IEEE Computer Architecture Letters.

[10]  John Kalamatianos,et al.  On characterizing near-threshold SRAM failures in FinFET technology , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[11]  Michael D. Smith,et al.  Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors via Software-Guided Thread Scheduling , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[12]  Lizy Kurian John,et al.  AUDIT: Stress Testing the Automatic Way , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[13]  Shidhartha Das,et al.  Harnessing Voltage Margins for Energy Efficiency in Multicore CPUs , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[14]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[15]  Alaa R. Alameldeen,et al.  Trading off Cache Capacity for Reliability to Enable Low Voltage Operation , 2008, 2008 International Symposium on Computer Architecture.

[16]  Paolo A. Aseron,et al.  A 45 nm Resilient Microprocessor Core for Dynamic Variation Tolerance , 2011, IEEE Journal of Solid-State Circuits.

[17]  Kaushik Roy,et al.  Containing the Nanometer “Pandora-Box”: Cross-Layer Design Techniques for Variation Aware Low Power Systems , 2011, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[18]  Pradip Bose,et al.  Voltage Noise in Multi-Core Processors: Empirical Characterization and Optimization Opportunities , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.