Quick detection of difficult bugs for effective post-silicon validation

We present a new technique for systematically creating postsilicon validation tests that quickly detect bugs in processor cores and uncore components (cache controllers, memory controllers, on-chip networks) of multi-core System on Chips (SoCs). Such quick detection is essential because long error detection latency, the time elapsed between the occurrence of an error due to a bug and its manifestation as an observable failure, severely limits the effectiveness of existing post-silicon validation approaches. In addition, we provide a list of realistic bug scenarios abstracted from “difficult” bugs that occurred in commercial multi-core SoCs. Our results for an OpenSPARC T2-like multi-core SoC demonstrate: 1. Error detection latencies of “typical” post-silicon validation tests can be very long, up to billions of clock cycles, especially for bugs in uncore components. 2. Our new technique shortens error detection latencies by several orders of magnitude to only a few hundred cycles for most bug scenarios. 3. Our new technique enables 2-fold increase in bug coverage. An important feature of our technique is its software-only implementation without any hardware modification. Hence, it is readily applicable to existing designs.

[1]  Sanjit A. Seshia,et al.  Post-silicon validation opportunities, challenges and recent advances , 2010, Design Automation Conference.

[2]  Kwang-Ting Cheng,et al.  Post-silicon bug detection for variation induced electrical bugs , 2011, 16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011).

[3]  Onur Mutlu,et al.  Online design bug detection: RTL analysis, flexible mechanisms, and evaluation , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[4]  B. Bentley Validating The Intel Pentium 4 Processor 1 Validating The Intel ® Pentium ® 4 Processor , 2022 .

[5]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[6]  Doug Josephson,et al.  The good, the bad, and the ugly of silicon debug , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[7]  Sharad Malik,et al.  Complementary use of runtime validation and model checking , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[8]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[9]  Srikanth Venkataraman,et al.  Automated Debug of Speed Path Failures Using Functional Tests , 2009, 2009 27th IEEE VLSI Test Symposium.

[10]  Priyadarsan Patra On the cusp of a validation wall , 2007, IEEE Design & Test of Computers.

[11]  Allon Adir,et al.  A unified methodology for pre-silicon verification and post-silicon validation , 2011, 2011 Design, Automation & Test in Europe.

[12]  Valeria Bertacco,et al.  Post-silicon verification for cache coherence , 2008, 2008 IEEE International Conference on Computer Design.

[13]  Valeria Bertacco,et al.  Dacota: Post-silicon validation of the memory subsystem in multi-core designs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[14]  Sridhar Narayanan,et al.  IODINE: a tool to automatically infer dynamic invariants for hardware designs , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[15]  Miroslav N. Velev,et al.  Collection of high-level microprocessor bugs from formal verification of pipelined and superscalar designs , 2003, International Test Conference, 2003. Proceedings. ITC 2003..

[16]  Valeria Bertacco,et al.  Reversi: Post-silicon validation system for modern microprocessors , 2008, 2008 IEEE International Conference on Computer Design.

[17]  Robert F. Molyneaux,et al.  Random self-test method applications on PowerPC/sup TM/ microprocessor caches , 1998, Proceedings of the 8th Great Lakes Symposium on VLSI (Cat. No.98TB100222).

[18]  David Lin,et al.  QED: Quick Error Detection tests for effective post-silicon validation , 2010, 2010 IEEE International Test Conference.

[19]  Zeljko Zilic,et al.  Assertion Checkers in Verification, Silicon Debug and In-Field Diagnosis , 2007, 8th International Symposium on Quality Electronic Design (ISQED'07).

[20]  Aharon Aharon,et al.  Test Program Generation for Functional Verification of PowePC Processors in IBM , 1995, 32nd Design Automation Conference.

[21]  Stephen McCamant,et al.  The Daikon system for dynamic detection of likely invariants , 2007, Sci. Comput. Program..

[22]  Edward J. McCluskey,et al.  Software-implemented EDAC protection against SEUs , 2000, IEEE Trans. Reliab..

[23]  Avi Ziv,et al.  Generating instruction streams using abstract CSP , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[24]  Laura M. Haas,et al.  Distributed deadlock detection , 1983, TOCS.

[25]  Janak H. Patel,et al.  Memory System Design for Tolerating Single Event Upsets , 1983, IEEE Transactions on Nuclear Science.

[26]  Eli Singerman,et al.  Transaction based pre-to-post silicon validation , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[27]  Mark Horowitz,et al.  Architecture validation for processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[28]  Nagib Hakim,et al.  Post-silicon validation challenges: How EDA and academia can help , 2010, Design Automation Conference.

[29]  Marina Papatriantafilou,et al.  Multi-word Atomic Read/Write Registers on Multiprocessor Systems , 2004, ESA.

[30]  John P. Hayes,et al.  Collection and Analysis of Microprocessor Design Errors , 2000, IEEE Des. Test Comput..