Deployment of better than worst-case design: solutions and needs

The advent of nanometer feature sizes in silicon fabrication has triggered a number of new design challenges for computer designers. These challenges include design complexity and operation in the presence of environmental and device uncertainty. To make things worse, these new challenges add to the many challenges that designers already face in order to scale system performance while meeting power and reliability budgets. Current design objectives are being met by applying even more engineers and increasing overall design times, an unsustainable trend. This paper overviews a novel design strategy, called better than worst-case design, that addresses these challenges through a methodology based on separating the concerns of performance and reliability by coupling complex design components with simple reliable checker mechanisms. We present the key aspects of better than worst-case design and cover some recently proposed solutions that deploy this technique in application domains ranging from microprocessors to digital signal processors. We then highlight a few aspects that need to be addressed to make this approach more practical in general contexts and suggest possible solutions.

[1]  Tong Liu,et al.  Performance improvement with circuit-level speculation , 2000, MICRO 33.

[2]  Thomas A. DeMassa,et al.  Digital Integrated Circuits , 1985, 1985 IEEE GaAs IC Symposium Technical Digest.

[3]  Sanjay Pant,et al.  A self-tuning DVS processor using delay-error detection and correction , 2005, IEEE Journal of Solid-State Circuits.

[4]  Andrew B. Kahng,et al.  Manufacturing-aware physical design , 2003, ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486).

[5]  Todd M. Austin,et al.  A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor , 2003, MICRO.

[6]  Kaustav Banerjee,et al.  Few electron devices: towards hybrid CMOS-SET integrated circuits , 2002, DAC '02.

[7]  Todd M. Austin,et al.  Scalable hybrid verification of complex microprocessors , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[8]  Robert S. Swarz,et al.  Reliable Computer Systems: Design and Evaluation , 1992 .

[9]  Ted Kehl,et al.  Hardware self-tuning and circuit performance monitoring , 1993, Proceedings of 1993 IEEE International Conference on Computer Design ICCD'93.

[10]  David Blaauw,et al.  Circuit-aware architectural simulation , 2004, Proceedings. 41st Design Automation Conference, 2004..

[11]  John P. Hayes,et al.  Logic Design Validation via Simulation and Automatic Test Pattern Generation , 2000, J. Electron. Test..

[12]  Todd M. Austin,et al.  A fault tolerant approach to microprocessor design , 2001, 2001 International Conference on Dependable Systems and Networks.

[13]  Giovanni De Micheli,et al.  An adaptive low-power transmission scheme for on-chip networks , 2002, 15th International Symposium on System Synthesis, 2002..

[14]  Lorena Anghel,et al.  Cost reduction and evaluation of temporary faults detecting technique , 2000, DATE '00.

[15]  Eberhard Böhl,et al.  The fail-stop controller AE11 , 1997, Proceedings International Test Conference 1997.

[16]  Augustus K. Uht Achieving Typical Delays in Synchronous Systems via Timing Error Toleration , 2000 .

[17]  Daniel P. Siewiorek,et al.  Reliable Computer Systems: Design and Evaluation, Third Edition , 1998 .

[18]  Santosh K. Shrivastava,et al.  Reliable Computer Systems , 1985, Texts and Monographs in Computer Science.

[19]  Naresh R. Shanbhag,et al.  Energy-efficient signal processing via algorithmic noise-tolerance , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[20]  David Blaauw,et al.  Making typical silicon matter with Razor , 2004, Computer.

[21]  James H. Stathis,et al.  Reliability limits for the gate insulator in CMOS technology , 2002, IBM J. Res. Dev..

[22]  Sanjay J. Patel,et al.  Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.

[23]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[24]  David Blaauw,et al.  Statistical Clock Skew Analysis Considering Intra-Die Process Variations , 2003, ICCAD.

[25]  Trevor Mudge,et al.  A self-tuning DVS processor using delay-error detection and correction , 2005, VLSIC 2005.

[26]  John P. Hayes,et al.  Testing ICs: Getting to the Core of the Problem , 1996, Computer.

[27]  James F. Ziegler,et al.  Terrestrial cosmic rays , 1996, IBM J. Res. Dev..

[28]  David Blaauw,et al.  Opportunities and challenges for better than worst-case design , 2005, ASP-DAC.

[29]  Yervant Zorian,et al.  2001 Technology Roadmap for Semiconductors , 2002, Computer.

[30]  Todd M. Austin,et al.  Efficient checker processor design , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.