IFRA: Instruction Footprint Recording and Analysis for post-silicon bug localization in processors

The objective of IFRA, instruction footprint recording and analysis, is to overcome the challenges associated with a very expensive step in post-silicon validation of processors - bug localization in a system setup. IFRA consists of special design and analysis techniques required to bridge a major gap between system-level and circuit-level debug. Special hardware recorders, called footprint recording structures (FRS's), record semantic information about data and control flows of instructions passing through various design blocks of a processor. This information is recorded concurrently during normal operation of a processor in a post-silicon system validation setup. Upon detection of a problem, the recorded information is scanned out and analyzed for bug localization. Special program analysis techniques, together with the binary of the application executed during post-silicon validation, are used for the analysis. IFRA does not require full system-level reproduction of bugs or system-level simulation. Simulation results on a complex super-scalar processor demonstrate that IFRA is effective in accurately localizing bugs with very little impact on overall chip area.

[1]  Peter Dahlgren,et al.  Latch divergency in microprocessor failure analysis , 2003, International Test Conference, 2003. Proceedings. ITC 2003..

[2]  Nicola Nicolici,et al.  On using lossless compression of debug data in embedded logic analysis , 2007, 2007 IEEE International Test Conference.

[3]  David J. Lu Watchdog Processors and Structural Integrity Checking , 1982, IEEE Transactions on Computers.

[4]  Jinuk Luke Shin,et al.  The UltraSPARC T1 Processor: CMT Reliability , 2006, IEEE Custom Integrated Circuits Conference 2006.

[5]  Sanjay J. Patel,et al.  Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.

[6]  Hiroyuki Sugiyama,et al.  A 1.3 GHz fifth generation SPARC64 microprocessor , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[7]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[8]  Doug Josephson,et al.  The good, the bad, and the ugly of silicon debug , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[9]  Sharad Malik,et al.  Complementary use of runtime validation and model checking , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[10]  Richard H. Livengood,et al.  Design for (physical) debug for silicon microsurgery and probing of flip-chip packaged integrated circuits , 1999, International Test Conference 1999. Proceedings (IEEE Cat. No.99CH37034).

[11]  Josep Torrellas,et al.  CADRE: Cycle-Accurate Deterministic Replay for Hardware Debugging , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[12]  Edward J. McCluskey,et al.  ED4I: Error Detection by Diverse Data and Duplicated Instructions , 2002, IEEE Trans. Computers.

[13]  Donal Heffernan,et al.  Emerging on-ship debugging techniques for real-time embedded systems , 2000 .

[14]  Priyadarsan Patra On the cusp of a validation wall , 2007, IEEE Design & Test of Computers.

[15]  Prabhakar Kudva,et al.  Soft-error resilience of the IBM POWER6 processor , 2008, IBM J. Res. Dev..

[16]  Ian G. Harris,et al.  Eliminating Nondeterminism to Enable Chip-Level Test of Globally-Asynchronous Locally-Synchronous SoC’s , 2003 .

[17]  Todd M. Austin,et al.  Shielding against design flaws with field repairable control logic , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[18]  Ismet Bayraktaroglu,et al.  Microprocessor silicon debug based on failure propagation tracing , 2005, IEEE International Conference on Test, 2005..

[19]  Gérard Memmi,et al.  A reconfigurable design-for-debug infrastructure for SoCs , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[20]  Kenneth P. Parker,et al.  The Boundary-Scan Handbook , 1992, Springer US.

[21]  I.G. Harris,et al.  Synchro-tokens: eliminating nondeterminism to enable chip-level test of globally-asynchronous SoC's , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[22]  Satish Narayanasamy,et al.  Patching Processor Design Errors with Programmable Hardware , 2007, IEEE Micro.

[23]  Sharad Malik,et al.  Runtime validation of memory ordering using constraint graph checking , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[24]  Eric M. Schwarz,et al.  P6 Binary Floating-Point Unit , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).