Facelift: Hiding and slowing down aging in multicores

Processors progressively age during their service life due to normal workload activity. Such aging results in gradually slower circuits. Anticipating this fact, designers add timing guardbands to processors, so that processors last for a number of years. As a result, aging has important design and cost implications. To address this problem, this paper shows how to hide the effects of aging and how to slow it down. Our framework is called Facelift. It hides aging through aging-driven application scheduling. It slows down aging by applying voltage changes at key times - it uses a non-linear optimization algorithm to carefully balance the impact of voltage changes on the aging rate and on the critical path delays. Moreover, Facelift can gainfully configure the chip for a short service life. Simulation results indicate that Facelift leads to more cost-effective multicores. We can take a multicore designed for a 7-year service life and, by hiding and slowing down aging, enable it to run, on average, at a 14-15% higher frequency during its whole service life. Alternatively, we can design the multicore for a 5 to 7-month service life and still use it for 7 years.

[1]  Shuguang Feng,et al.  Self-calibrating Online Wearout Detection , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[2]  S. P. Park,et al.  Estimation of statistical variation in temporal NBTI degradation and its impact on lifetime circuit performance , 2007, ICCAD 2007.

[3]  Lorenz T. Biegler,et al.  On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming , 2006, Math. Program..

[4]  S. Mahlke,et al.  Online Timing Analysis for Wearout Detection , 2006 .

[5]  Kevin Skadron,et al.  Temperature-aware microarchitecture , 2003, ISCA '03.

[6]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[7]  J. W. McPherson,et al.  Reliability challenges for 45nm and beyond , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[8]  Pradip Bose,et al.  The case for lifetime reliability-aware microprocessors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[9]  Eiji Takeda,et al.  Hot-Carrier Effects in MOS Devices , 1995 .

[10]  Scott A. Mahlke,et al.  Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[11]  T. Chen,et al.  Comparison of adaptive body bias (ABB) and adaptive supply voltage (ASV) for improving delay and leakage under the presence of process variation , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[12]  Yean-Kuen Fang,et al.  Temperature dependence of hot-carrier-induced degradation in 0.1 μm SOI nMOSFETs with thin oxide , 2002, IEEE Electron Device Letters.

[13]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[14]  Yun Zhang,et al.  Revisiting the Sequential Programming Model for Multi-Core , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[15]  Kaushik Roy,et al.  Negative Bias Temperature Instability: Estimation and Design for Improved Reliability of Nanoscale Circuits , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[16]  A. R. Newton,et al.  Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas , 1990 .

[17]  David M. Brooks,et al.  Mitigating the Impact of Process Variations on Processor Register Files and Execution Units , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[18]  Lawrence T. Clark,et al.  An embedded 32-b microprocessor core for low-power and high-performance applications , 2001 .

[19]  Ogawa,et al.  Generalized diffusion-reaction model for the low-field charge-buildup instability at the Si-SiO2 interface. , 1995, Physical review. B, Condensed matter.

[20]  C.H. Kim,et al.  An Analytical Model for Negative Bias Temperature Instability , 2006, 2006 IEEE/ACM International Conference on Computer Aided Design.

[21]  Ming Zhang,et al.  Circuit Failure Prediction and Its Application to Transistor Aging , 2007, 25th IEEE VLSI Test Symposium (VTS'07).

[22]  Yu Cao,et al.  Modeling and minimization of PMOS NBTI effect for robust nanometer design , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[23]  K. Yamaguchi,et al.  The impact of bias temperature instability for direct-tunneling ultra-thin gate oxide on MOSFET scaling , 1999, 1999 Symposium on VLSI Technology. Digest of Technical Papers (IEEE Cat. No.99CH36325).

[24]  Yu Cao,et al.  Compact Modeling and Simulation of Circuit Reliability for 65-nm CMOS Technology , 2007, IEEE Transactions on Device and Materials Reliability.

[25]  Narayanan Vijaykrishnan,et al.  Impact of NBTI on FPGAs , 2007, 20th International Conference on VLSI Design held jointly with 6th International Conference on Embedded Systems (VLSID'07).

[26]  Pradip Bose,et al.  Exploiting structural duplication for lifetime reliability enhancement , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[27]  Emil Talpes,et al.  Variability and energy awareness: a microarchitecture-level perspective , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[28]  Kevin Skadron,et al.  HotLeakage: A Temperature-Aware Model of Subthreshold and Gate Leakage for Architects , 2003 .

[29]  Yukiharu Uraoka,et al.  Hot Carrier Effect in UltraThin Gate Oxide Metal Oxide Semiconductor Field Effect Transistor , 2005 .

[30]  Saurabh Dighe,et al.  An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[31]  Pradip Bose,et al.  A Proactive Wearout Recovery Approach for Exploiting Microarchitectural Redundancy to Extend Cache SRAM Lifetime , 2008, 2008 International Symposium on Computer Architecture.

[32]  Babak Falsafi,et al.  Detecting Emerging Wearout Faults , 2007 .

[33]  Kevin Skadron,et al.  Impact of Process Variations on Multicore Performance Symmetry , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[34]  Josep Torrellas,et al.  Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors , 2008, 2008 International Symposium on Computer Architecture.

[35]  Jaume Abella,et al.  NBTI-Resilient Memory Cells with NAND Gates for Highly-Ported Structures , 2007 .

[36]  Jaume Abella,et al.  Penelope: The NBTI-Aware Processor , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[37]  Sani R. Nassif,et al.  High Performance CMOS Variability in the 65nm Regime and Beyond , 2007 .

[38]  D. Kwong,et al.  Dynamic NBTI of p-MOS transistors and its impact on MOSFET scaling , 2002, IEEE Electron Device Letters.

[39]  J. Torrellas,et al.  VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects , 2008, IEEE Transactions on Semiconductor Manufacturing.

[40]  James D. Meindl,et al.  Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration , 2002, IEEE J. Solid State Circuits.