Overcoming post-silicon validation challenges through Quick Error Detection (QED)

Existing post-silicon validation techniques are generally ad hoc, and their cost and complexity are rising faster than design cost. Hence, systematic approaches to post-silicon validation are essential. Our research indicates that many of the bottlenecks of existing post-silicon validation approaches are direct consequences of very long error detection latencies. Error detection latency is the time elapsed between the activation of a bug during post-silicon validation and its detection or manifestation as a system failure. In our earlier papers, we created the Quick Error Detection (QED) technique to overcome this significant challenge. QED systematically creates a wide variety of post-silicon validation tests to detect bugs in processor cores and uncore components of multi-core System-on-Chips (SoCs) very quickly, i.e., with very short error detection latencies. In this paper, we present an overview of QED and summarize key results: 1. Error detection latencies of “typical” post-silicon validation tests can range up to billions of clock cycles. 2. QED shortens error detection latencies by up to 6 orders of magnitude. 3. QED enables 2- to 4-fold improvement in bug coverage. QED does not require any hardware modification. Hence, it is readily applicable to existing designs.

[1]  Edward J. McCluskey,et al.  Stuck-fault tests vs. actual defects , 2000, Proceedings International Test Conference 2000 (IEEE Cat. No.00CH37159).

[2]  Aharon Aharon,et al.  Test Program Generation for Functional Verification of PowePC Processors in IBM , 1995, 32nd Design Automation Conference.

[3]  Nicola Nicolici,et al.  Automated Trace Signals Identification and State Restoration for Improving Observability in Post-Silicon Validation , 2008, 2008 Design, Automation and Test in Europe.

[4]  Nur A. Touba,et al.  Bit-fixing in pseudorandom sequences for scan BIST , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[5]  Allon Adir,et al.  A unified methodology for pre-silicon verification and post-silicon validation , 2011, 2011 Design, Automation & Test in Europe.

[6]  Kenneth M. Butler,et al.  So what is an optimal test mix? A discussion of the SEMATECH methods experiment , 1997, Proceedings International Test Conference 1997.

[7]  Edward J. McCluskey,et al.  Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..

[8]  Laura M. Haas,et al.  Distributed deadlock detection , 1983, TOCS.

[9]  Robert C. Aitken,et al.  Test sets and reject rates: all fault coverages are not created equal , 1993, IEEE Design & Test of Computers.

[10]  Subhasish Mitra,et al.  X-compact: an efficient response compaction technique , 2004, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  Michael N. Lovellette,et al.  Strategies for fault-tolerant, space-based computing: Lessons learned from the ARGOS testbed , 2002, Proceedings, IEEE Aerospace Conference.

[12]  Valeria Bertacco,et al.  Dacota: Post-silicon validation of the memory subsystem in multi-core designs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[13]  David Lin,et al.  QED: Quick Error Detection tests for effective post-silicon validation , 2010, 2010 IEEE International Test Conference.

[14]  Zeljko Zilic,et al.  Assertion Checkers in Verification, Silicon Debug and In-Field Diagnosis , 2007, 8th International Symposium on Quality Electronic Design (ISQED'07).

[15]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[16]  James B. Angell,et al.  Enhancing Testability of Large-Scale Integrated Circuits via Test Points and Additional Logic , 1973, IEEE Transactions on Computers.

[17]  Nagib Hakim,et al.  Post-silicon validation challenges: How EDA and academia can help , 2010, Design Automation Conference.

[18]  Ahmad A. Al-Yamani,et al.  ELF-Murphy data on defects and tests sets , 2004, 22nd IEEE VLSI Test Symposium, 2004. Proceedings..

[19]  John P. Hayes,et al.  Collection and Analysis of Microprocessor Design Errors , 2000, IEEE Des. Test Comput..

[20]  B. Koneman,et al.  LFSR-Coded Test Patterns for Scan Designs , 1993 .

[21]  Miroslav N. Velev,et al.  Collection of high-level microprocessor bugs from formal verification of pipelined and superscalar designs , 2003, International Test Conference, 2003. Proceedings. ITC 2003..

[22]  Thomas W. Williams,et al.  A logic design structure for LSI testability , 1977, DAC '77.

[23]  Jacob Savir,et al.  Built In Test for VLSI: Pseudorandom Techniques , 1987 .

[24]  Onur Mutlu,et al.  Online design bug detection: RTL analysis, flexible mechanisms, and evaluation , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[25]  Eli Singerman,et al.  Transaction based pre-to-post silicon validation , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[26]  Mark Bohr,et al.  The new era of scaling in an SoC world , 2009, 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[27]  Valeria Bertacco,et al.  Reversi: Post-silicon validation system for modern microprocessors , 2008, 2008 IEEE International Conference on Computer Design.

[28]  Robert F. Molyneaux,et al.  Random self-test method applications on PowerPC/sup TM/ microprocessor caches , 1998, Proceedings of the 8th Great Lakes Symposium on VLSI (Cat. No.98TB100222).

[29]  Farzan Fallah,et al.  Quick detection of difficult bugs for effective post-silicon validation , 2012, DAC Design Automation Conference 2012.

[30]  Edward J. McCluskey,et al.  Logic design principles - with emphasis on testable semicustom circuits , 1986, Prentice Hall series in computer engineering.

[31]  Enamul Amyeen,et al.  Microprocessor system failures debug and fault isolation methodology , 2009, 2009 International Test Conference.

[32]  Doug Josephson,et al.  The good, the bad, and the ugly of silicon debug , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[33]  Sharad Malik,et al.  Complementary use of runtime validation and model checking , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[34]  Edward J. McCluskey,et al.  ED4I: Error Detection by Diverse Data and Duplicated Instructions , 2002, IEEE Trans. Computers.

[35]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[36]  Prabhat Mishra,et al.  Efficient Trace Signal Selection for Post Silicon Validation and Debug , 2011, 2011 24th Internatioal Conference on VLSI Design.

[37]  B. Bentley Validating The Intel Pentium 4 Processor 1 Validating The Intel ® Pentium ® 4 Processor , 2022 .

[38]  Priyadarsan Patra On the cusp of a validation wall , 2007, IEEE Design & Test of Computers.

[39]  Nur A. Touba,et al.  Test vector decompression via cyclical scan chains and its application to testing core-based designs , 1998, Proceedings International Test Conference 1998 (IEEE Cat. No.98CH36270).

[40]  Janak H. Patel,et al.  Reducing test application time for full scan embedded cores , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[41]  Valeria Bertacco,et al.  Post-silicon verification for cache coherence , 2008, 2008 IEEE International Conference on Computer Design.

[42]  Alan J. Weger,et al.  Power management of multi-core chips: Challenges and pitfalls , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[43]  Mark Horowitz,et al.  Architecture validation for processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[44]  Qiang Xu,et al.  Trace signal selection for visibility enhancement in post-silicon validation , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[45]  Edward J. McCluskey,et al.  An experimental chip to evaluate test techniques experiment results , 1995, Proceedings of 1995 IEEE International Test Conference (ITC).

[46]  Edward J. McCluskey,et al.  Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..

[47]  Sanjit A. Seshia,et al.  Post-silicon validation opportunities, challenges and recent advances , 2010, Design Automation Conference.

[48]  Alexander Miczo,et al.  Digital logic testing and simulation , 1986 .

[49]  Gérard Memmi,et al.  A reconfigurable design-for-debug infrastructure for SoCs , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[50]  Allon Adir,et al.  Reaching Coverage Closure in Post-silicon Validation , 2010, Haifa Verification Conference.

[51]  Melvin A. Breuer,et al.  Digital systems testing and testable design , 1990 .

[52]  Jack J. Dongarra,et al.  The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..

[53]  Xinli Gu,et al.  A practical perspective on reducing ASIC NTFs , 2005, IEEE International Conference on Test, 2005..