Temperature-Centric Reliability Analysis and Optimization of Electronic Systems Under Process Variation

Electronic system designs that ignore process variation are unreliable and inefficient. In this paper, we propose a system-level framework for the analysis of temperature-induced failures that considers the uncertainty due to process variation. As an intermediate step, we also develop a probabilistic technique for dynamic steady-state temperature analysis. Given an electronic system under a certain workload, our framework delivers the corresponding survival function, founded on the basis of well-established reliability models, with a closed-form stochastic parameterization in terms of the quantities that are uncertain at the design stage. The proposed solution is exemplified considering systems with periodic workloads that suffer from the thermal-cycling fatigue. The analysis of this fatigue is a challenging problem as it requires the availability of detailed temperature profiles, which are uncertain due to the variability of process parameters. To demonstrate the computational efficiency of our framework, we undertake a design-space exploration procedure to minimize the expected energy consumption under a set of timing, thermal, and reliability constraints.

[1]  Yu-Min Lee,et al.  An efficient method for analyzing on-chip thermal reliability considering process variations , 2013, TODE.

[2]  Wayne H. Wolf,et al.  TGFF: task graphs for free , 1998, Proceedings of the Sixth International Workshop on Hardware/Software Codesign. (CODES/CASHE'98).

[3]  David Blaauw,et al.  Statistical Analysis and Optimization for VLSI: Timing and Power , 2005, Series on Integrated Circuits and Systems.

[5]  Omar M. Knio,et al.  Spectral Methods for Uncertainty Quantification , 2010 .

[6]  S. Janson Gaussian Hilbert Spaces , 1997 .

[7]  Yao-Wen Chang,et al.  Statistical thermal modeling and optimization considering leakage power variations , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[8]  Li Shang,et al.  System-level reliability modeling for MPSoCs , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[9]  M. Hochbruck,et al.  Exponential integrators , 2010, Acta Numerica.

[10]  Ignacio Díaz-Emparanza Is a small Monte Carlo analysis a good analysis? , 2000 .

[11]  Petru Eles,et al.  Steady-state dynamic temperature analysis and reliability optimization for embedded multiprocessor systems , 2012, DAC Design Automation Conference 2012.

[12]  Bharadwaj Veeravalli,et al.  Reinforcement learning-based inter- and intra-application thermal optimization for lifetime improvement of multicore systems , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[13]  B. Veeravalli,et al.  A Survey of Lifetime Reliability-Aware System-Level Design Techniques for Embedded Multiprocessor Systems , 2014 .

[14]  Costas J. Spanos,et al.  Modeling within-die spatial correlation effects for process-design co-optimization , 2005, Sixth international symposium on quality electronic design (isqed'05).

[15]  Fabio Nobile,et al.  An Anisotropic Sparse Grid Stochastic Collocation Method for Partial Differential Equations with Random Input Data , 2008, SIAM J. Numer. Anal..

[16]  M. Eldred,et al.  Evaluation of Non-Intrusive Approaches for Wiener-Askey Generalized Polynomial Chaos. , 2008 .

[17]  O. L. Maître,et al.  Spectral Methods for Uncertainty Quantification: With Applications to Computational Fluid Dynamics , 2010 .

[18]  Petru Eles,et al.  Probabilistic Analysis of Power and Temperature Under Process Variation for Electronic System Design , 2014, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[19]  Florian Heiss,et al.  Likelihood approximation by numerical integration on sparse grids , 2008 .

[20]  R. Durrett Probability: Theory and Examples , 1993 .

[21]  Huazhong Yang,et al.  Accurate temperature-dependent integrated circuit leakage power estimation is easy , 2007 .

[22]  K. Mani Chandy,et al.  A comparison of list schedules for parallel processing systems , 1974, Commun. ACM.

[23]  Kevin Skadron,et al.  Temperature-aware microarchitecture: Modeling and implementation , 2004, TACO.

[24]  Xiukai Yuan,et al.  Nataf transformation based point estimate method , 2008 .