LifeSim: A lifetime reliability simulator for manycore systems

The increasing demand for high-performance applications along with the advancement of technology, leading to power hungry manycore processors and rising chip temperature, have made the devices increasingly susceptible to wearout and aging resulting in early failure of the processing cores. The systemlevel analysis and optimization techniques offer a holistic view and ample opportunities to address lifetime reliability challenges, that can be explored and evaluated with the help of a fast and accurate simulation environment. This paper presents LifeSim, a simulation tool that integrates i) a state-of-the-art manycore simulator, ii) a thermal simulator and iii) a lifetime reliability analyzer. The simulation tool is easily configurable without any code modification and compilation, with the help of a configuration file. To facilitate the development of solutions to mitigate aging and improve lifetime reliability, we enhanced the simulator with scheduling and frequency control features. It offers both preemptive and non-preemptive scheduling along with an interface for dynamic voltage frequency scaling (DVFS). Further, it logs statistics such as power, temperature, aging, and mean time to failure (MTTF) and also generates graphs for visualization and easy comparison of the performance of the solution adopted by the user.

[1]  Qiang Xu,et al.  Lifetime reliability-aware task allocation and scheduling for MPSoC platforms , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[2]  S. Zafar,et al.  A Model for Negative Bias Temperature Instability in Oxide and High κ pFETs , 2007, 2007 IEEE International Conference on Integrated Circuit Design and Technology.

[3]  Lieven Eeckhout,et al.  Sniper: scalable and accurate parallel multi-core simulation , 2012 .

[4]  Christine A. Shoemaker,et al.  Scalable thread scheduling and global power management for heterogeneous many-core architectures , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[5]  Hannu Tenhunen,et al.  A lifetime-aware runtime mapping approach for many-core systems in the dark silicon era , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[6]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[7]  Pradip Bose,et al.  Exploiting structural duplication for lifetime reliability enhancement , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[8]  B.H. Lee,et al.  A model for negative bias temperature instability (NBTI) in oxide and high /spl kappa/ pFETs 13/spl times/-C6D8C7F5F2 , 2004, Digest of Technical Papers. 2004 Symposium on VLSI Technology, 2004..

[9]  Sheldon X.-D. Tan,et al.  Physics-based electromigration assessment for power grid networks , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[10]  Lei He,et al.  Temperature-Aware Performance and Power Modeling , 2004 .

[11]  Sudhakar Yalamanchili,et al.  Energy Introspector : Simulation Infrastructure for Power , Temperature , and Reliability Modeling in Manycore Processors , 2011 .

[12]  Pradip Bose,et al.  The case for lifetime reliability-aware microprocessors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[13]  Kevin Skadron,et al.  HotSpot 6.0: Validation, Acceleration and Extension , 2015 .

[14]  Wan Yeon Lee,et al.  Energy-Saving DVFS Scheduling of Multiple Periodic Real-Time Tasks on Multi-core Processors , 2009, 2009 13th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications.

[15]  Jun Wang,et al.  Manifold: A parallel simulation framework for multicore systems , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[16]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[17]  Bin Liu,et al.  A 5.8 pJ/Op 115 billion ops/sec, to 1.78 trillion ops/sec 32nm 1000-processor array , 2016, 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).

[18]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[19]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[20]  Luca Benini,et al.  Workload and user experience-aware Dynamic Reliability Management in multicore processors , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[21]  Siddharth Garg,et al.  Cherry-picking: Exploiting process variations in dark-silicon homogeneous chip multi-processors , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).