Simulating Wear-out Effects of Asymmetric Multicores at the Architecture Level

As the silicon industry moves into deep nanoscale technologies, preserving Mean Time to Failure at acceptable levels becomes a first-order challenge. The operational stress, along with the inefficient power dissipation and the unsustainable thermal thresholds increase the wear-induced failures. As a result, faster wear-out leads to earlier performance degradation with eventual device breakdown. Furthermore, the proliferation of asymmetric multicores is tightly coupled with an increasing susceptibility to variable wear-out rate within the components of processors. This paper investigates the reliability boundaries of asymmetric multicores, which span from embedded systems to high performance computing domains, by performing a continuous-operation reliability assessment. As our experimental analysis illustrates, the variation between the least and the most aged hardware resource equals to 2.6 years. Motivated by this finding, we show that an MTTF-aware, asymmetric configuration prolongs its lifetime by 21%.

[1]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[2]  Andrew R. Brown,et al.  Statistical variability and reliability in nanoscale FinFETs , 2011, 2011 International Electron Devices Meeting.

[3]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Amin Ansari,et al.  Maestro: Orchestrating Lifetime Reliability in Chip Multiprocessors , 2010, HiPEAC.

[5]  Nikil D. Dutt,et al.  Exploiting Heterogeneity for Aging-Aware Load Balancing in Mobile Platforms , 2017, IEEE Transactions on Multi-Scale Computing Systems.

[6]  Pradip Bose,et al.  Exploiting structural duplication for lifetime reliability enhancement , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[7]  Qiang Xu,et al.  On Modeling the Lifetime Reliability of Homogeneous Manycore Systems , 2008, 2008 14th IEEE Pacific Rim International Symposium on Dependable Computing.

[8]  Cristinel Ababei,et al.  Investigation of DVFS based dynamic reliability management for chip multiprocessors , 2015, 2015 International Conference on High Performance Computing & Simulation (HPCS).

[9]  Mehdi Baradaran Tahoori,et al.  Dependable Multicore Architectures at Nanoscale: The View From Europe , 2015, IEEE Design & Test.

[10]  Sheldon X.-D. Tan,et al.  Learning-based dynamic reliability management for dark silicon processor considering EM effects , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[11]  Luca Benini,et al.  Workload and user experience-aware Dynamic Reliability Management in multicore processors , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[12]  Stephen P. Boyd,et al.  Self-Tuning for Maximized Lifetime Energy-Efficiency in the Presence of Circuit Aging , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[13]  Javier Navaridas,et al.  Cyclic Power-Gating as an Alternative to Voltage and Frequency Scaling , 2016, IEEE Computer Architecture Letters.

[14]  Josep Torrellas,et al.  Facelift: Hiding and slowing down aging in multicores , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[15]  John I. McCool,et al.  Using the Weibull Distribution: Reliability, Modeling, and Inference , 2012 .

[16]  Cristiana Bolchini,et al.  Lifetime-aware load distribution policies in multi-core systems: An in-depth analysis , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[17]  Qiang Xu,et al.  Characterizing the lifetime reliability of manycore processors with core-level redundancy , 2010, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[18]  Sudhakar Yalamanchili,et al.  Architectural Reliability: Lifetime Reliability Characterization and Management ofMany-Core Processors , 2015, IEEE Computer Architecture Letters.

[19]  Bart Vermeulen,et al.  Overview of Health Monitoring Techniques for Reliability , 2016, ERMAVSS@DATE.

[20]  S. Mahlke,et al.  Olay : Combat the Signs of Aging with Introspective Reliability Management , 2008 .

[21]  Albert Cohen,et al.  OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs , 2012, TACO.

[22]  Muhammad Shafique,et al.  Compiler-driven dynamic reliability management for on-chip systems under variabilities , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[23]  Qiang Xu,et al.  AgeSim: A simulation framework for evaluating the lifetime reliability of processor-based SoCs , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[24]  Francky Catthoor,et al.  Bias Temperature Instability analysis of FinFET based SRAM cells , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[25]  Paul Gratz,et al.  Use it or lose it: Wear-out and lifetime in future chip multiprocessors , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[26]  Pradip Bose,et al.  Metrics for Architecture-Level Lifetime Reliability Analysis , 2008, ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software.

[27]  Hyejeong Hong,et al.  Lifetime Reliability Enhancement of Microprocessors , 2015, ACM Comput. Surv..

[28]  Muhammad Shafique,et al.  Hayat: Harnessing Dark Silicon and variability for aging deceleration and balancing , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[29]  Indrani Paul,et al.  Achieving Exascale Capabilities through Heterogeneous Computing , 2015, IEEE Micro.

[30]  Mehdi Baradaran Tahoori,et al.  ExtraTime: Modeling and analysis of wearout due to transistor aging at microarchitecture-level , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[31]  Jaume Abella,et al.  Penelope: The NBTI-Aware Processor , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[32]  Shuguang Feng,et al.  Self-calibrating Online Wearout Detection , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[33]  Sudhakar Yalamanchili,et al.  Amdahl's law for lifetime reliability scaling in heterogeneous multicore processors , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[34]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .