NBTI alleviation on FinFET-made GPUs by utilizing device heterogeneity

Recent experimental studies reveal that FinFET devices commercialized in recent years tend to suffer from more severe NBTI degradation compared to planar transistors, necessitating effective techniques on processors built with FinFET for endurable operations. We propose to address this problem by exploiting the device heterogeneity and leveraging the slower NBTI aging rate manifested on the planar devices. We focus on modern graphics processing units in this study due to their wide usage in the current community. We validate the effectiveness of the technique by applying it to the warp scheduler and L2 cache, and demonstrate that NBTI degradation is considerably alleviated with slight performance overhead.

[1]  Sorin Cotofana,et al.  Statistical reliability analysis of NBTI impact on FinFET SRAMs and mitigation technique using independent-gate devices , 2012, 2012 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH).

[2]  Tajana Simunic,et al.  Temperature aware thread block scheduling in GPGPUs , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[3]  Niraj K. Jha,et al.  3D vs. 2D analysis of FinFET logic gates under process variations , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[4]  Luca Benini,et al.  Aging-aware compiler-directed VLIW assignment for GPGPU architectures , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[5]  Richard W. Vuduc,et al.  Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU) , 2012, Synthesis Lectures on Computer Architecture.

[6]  Tao Li,et al.  Power-performance co-optimization of throughput core architecture using resistive memory , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[7]  Mahmut T. Kandemir,et al.  Performance enhancement under power constraints using heterogeneous CMOS-TFET multicores , 2012, CODES+ISSS '12.

[8]  R. Degraeve,et al.  Reliability Comparison of Triple-Gate Versus Planar SOI FETs , 2006, IEEE Transactions on Electron Devices.

[9]  Josep Torrellas,et al.  Facelift: Hiding and slowing down aging in multicores , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[10]  Ulf Schlichtmann,et al.  Predicting future product performance: Modeling and evaluation of standard cells in FinFET technologies , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[11]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[12]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[13]  ミン・ヤン,et al.  Hybrid planar and FinFETCMOS device , 2004 .

[14]  Narayanan Vijaykrishnan,et al.  Exploiting Heterogeneity for Energy Efficiency in Chip Multiprocessors , 2011, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[15]  Zeshan Chishti,et al.  Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures , 2003, MICRO.

[16]  B. Kaczer,et al.  Reliability issues in MuGFET nanodevices , 2008, 2008 IEEE International Reliability Physics Symposium.

[17]  Donggun Park,et al.  A study of negative-bias temperature instability of SOI and body-tied FinFETs , 2005, IEEE Electron Device Letters.

[18]  A. Asenov,et al.  Predicting future technology performance , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[19]  Erika Gunadi,et al.  Combating Aging with the Colt Duty Cycle Equalizer , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[20]  Nam Sung Kim,et al.  GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.

[21]  Narayanan Vijaykrishnan,et al.  An energy-efficient heterogeneous CMP based on hybrid TFET-CMOS cores , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[22]  Narayanan Vijaykrishnan,et al.  Impact of NBTI on FPGAs , 2007, 20th International Conference on VLSI Design held jointly with 6th International Conference on Embedded Systems (VLSID'07).

[23]  Hyesoon Kim,et al.  Performance Analysis and Tuning for General Purpose Graphics Processing Units , 2012 .

[24]  Yuan Xie,et al.  Dependability analysis of nano-scale FinFET circuits , 2006, IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures (ISVLSI'06).

[25]  Mahmut T. Kandemir,et al.  Improving energy efficiency of multi-threaded applications using heterogeneous CMOS-TFET multicores , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[26]  Jean-Pierre Colinge,et al.  Multiple-gate SOI MOSFETs , 2004 .

[27]  Lu Peng,et al.  Mitigating NBTI Degradation on FinFET GPUs through Exploiting Device Heterogeneity , 2014, 2014 IEEE Computer Society Annual Symposium on VLSI.

[28]  Tao Li,et al.  NBTI tolerant microarchitecture design in the presence of process variation , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[29]  Xiaoxia Wu,et al.  Hybrid cache architecture with disparate memory technologies , 2009, ISCA '09.

[30]  Andrew B. Kahng The ITRS design technology and system drivers roadmap: Process and status , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[31]  Ishiuchi,et al.  Alpha-Power Law MOSFET Model and its Applications to CMOS Inverter Delay and Other Formulas , 2004 .

[32]  Pradip Bose,et al.  A Proactive Wearout Recovery Approach for Exploiting Microarchitectural Redundancy to Extend Cache SRAM Lifetime , 2008, 2008 International Symposium on Computer Architecture.

[33]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[34]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.