RAMP : A Model for Reliability Aware MicroProcessor Design

This report introducesRAMP , an architectural model for long-term processor reliabili ty measurement. With aggresive transistor scaling and increasing pro cessor power and temperature, reliability due to wear-out mechanisms is expected to become a significant is sue n microprocessor design. Reliability awareness at the microarchitectural design stage will s oon be a neccessity and RAMP provides a convenient abstraction to do so. RAMP models chip wide mean time to failure as a function of the failure rates of individual structures on chip due to different failure mechanisms, and can be used t o valuate the reliability implications of different applications, architectural features, and proc essor designs. RAMP is a self-standing module which can be attached to archi te tural simulators which generate power and temperature measurements, and has currently b en ported to IBM’s Turandot processor simulator and the RSIM architectural simulator.

[1]  A. Christou Electromigration and electronic device degradation , 1994 .

[2]  Ramesh Karri,et al.  Electromigration reliability enhancement via bus activity distribution , 1996, DAC '96.

[3]  T. S. Sriram,et al.  A statistical approach to electromigration design for high performance VLSI , 1998 .

[4]  J. Stathis,et al.  Reliability projection for ultra-thin oxides at low voltage , 1998, International Electron Devices Meeting 1998. Technical Digest (Cat. No.98CH36217).

[5]  M. Pecht,et al.  Guidebook for managing silicon chip reliability , 1998 .

[6]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[7]  Pradip Bose,et al.  Validation of Turandot, a fast processor model for microarchitecture exploration , 1999, 1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305).

[8]  Lisa Spainhower,et al.  IBM S/390 Parallel Enterprise Server G5 fault tolerance: A historical perspective , 1999, IBM J. Res. Dev..

[9]  W. W. Abadeer,et al.  Key measurements of ultrathin gate dielectric reliability and in-line monitoring , 1999, IBM J. Res. Dev..

[10]  B. Agarwala,et al.  Scaling effect on electromigration in on-chip Cu wiring , 1999, Proceedings of the IEEE 1999 International Interconnect Technology Conference (Cat. No.99EX247).

[11]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[12]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[13]  Timothy J. Maloney,et al.  The Quality and Reliability of Intel's Quarter Micron Process , 2000 .

[14]  Farid N. Najm,et al.  A statistical model for electromigration failures , 2000, Proceedings IEEE 2000 First International Symposium on Quality Electronic Design (Cat. No. PR00525).

[15]  J. Rawlins,et al.  760 MHz G6 S/390 microprocessor exploiting multiple Vt and copper interconnects , 2000, 2000 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.00CH37056).

[16]  Nicholas P. Mencinger,et al.  A Mechanism-Based Methodology for Processor Package Reliability Assessments , 2000 .

[17]  William J. Bowhill,et al.  Design of High-Performance Microprocessor Circuits , 2001 .

[18]  D. Klaus,et al.  A high performance liner for copper damascene interconnects , 2001, Proceedings of the IEEE 2001 International Interconnect Technology Conference (Cat. No.01EX461).

[19]  Saving energy with architectural and frequency adaptations for multimedia applications , 2001, MICRO.

[20]  Edward J. Nowak,et al.  CMOS scaling beyond the 100-nm node with silicon-dioxide-based gate dielectrics , 2002, IBM J. Res. Dev..

[21]  Noah Treuhaft,et al.  Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies , 2002 .

[22]  James H. Stathis,et al.  Reliability limits for the gate insulator in CMOS technology , 2002, IBM J. Res. Dev..

[23]  Jordi Suñé,et al.  Interplay of voltage and temperature acceleration of oxide breakdown for ultra-thin gate oxides , 2002 .

[24]  Christopher J. Hughes,et al.  RSIM: Simulating Shared-Memory Multiprocessors with ILP Processors , 2002, Computer.

[25]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[26]  Tejas Karkhanis,et al.  Energy efficient co-adaptive instruction fetch and issue , 2003, ISCA '03.

[27]  Pradip Bose,et al.  Energy efficient co-adaptive instruction fetch and issue , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[28]  Andrew H. Simon,et al.  Comparison of Cu electromigration lifetime in Cu interconnects coated with various caps , 2003 .

[29]  Sarita V. Adve,et al.  Predictive dynamic thermal management for multimedia applications , 2003, ICS '03.

[30]  Kevin Skadron,et al.  Temperature-aware microarchitecture , 2003, ISCA '03.

[31]  M. Belyansky Integration of CVD W- and Ta-based Liners for Copper Metallization , 2005 .

[32]  Fast Temperature Cycling Stress-Induced and Electromigration-Induced Interlayer Dielectric Cracking Failure in Multilevel Interconnection , .