Error Detection Using Dynamic Dataflow Verification

A significant fraction of the circuitry in a modern processor is dedicated to converting the linear instruction stream into a representation that allows the execution of instructions in data dependence order, rather than program order, to extract instruction level parallelism. All errors caused by hardware faults in this circuitry - which includes the fetch and decode stages, renaming and scheduling logic, as well as the commit stage - will manifest themselves as incorrectly constructed dataflow graphs. Dynamic dataflow verification (DDFV) compares the dynamically constructed and executed dataflow graph to the expected dataflow graph of the static program binary, represented by a signature embedded in the instruction stream. The signature comparison enables comprehensive detection of transient errors, permanent errors, and design bugs in the dataflow circuitry. We show that DDFV detects errors with high probability, at a low hardware and performance cost.

[1]  Todd M. Austin,et al.  Ultra low-cost defect protection for microprocessor pipelines , 2006, ASPLOS XII.

[2]  Wen-mei W. Hwu,et al.  A software based approach to achieving optimal performance for signature control flow checking , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[3]  Aneesh Aggarwal,et al.  Self-checking instructions — reducing instruction redundancy for concurrent error detection , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[4]  James R. Larus,et al.  EEL: machine-independent executable editing , 1995, PLDI '95.

[5]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[6]  Shubhendu S. Mukherjee,et al.  Detailed design and evaluation of redundant multithreading alternatives , 2002, ISCA.

[7]  Sarita V. Adve,et al.  The impact of technology scaling on lifetime reliability , 2004, International Conference on Dependable Systems and Networks, 2004.

[8]  R. Nagarajan,et al.  A design space evaluation of grid processor architectures , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[9]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[10]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[11]  Edward J. McCluskey,et al.  Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..

[12]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[13]  Todd M. Austin,et al.  A fault tolerant approach to microprocessor design , 2001, 2001 International Conference on Dependable Systems and Networks.

[14]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[15]  Arun K. Somani,et al.  On-line integrity monitoring of microprocessor control logic , 2001 .

[16]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[17]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[18]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[19]  Norman P. Jouppi,et al.  WRL Research Report 93/5: An Enhanced Access and Cycle Time Model for On-chip Caches , 1994 .

[20]  Dean M. Tullsen,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[21]  James R. Larus,et al.  Using Paths to Measure, Explain, and Enhance Program Behavior , 2000, Computer.

[22]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[23]  Gabriele Saucier,et al.  Formalizing Signature Analysis for Control Flow Checking of Pipelined RISC Microprocessors , 1991, 1991, Proceedings. International Test Conference.

[24]  Bharadwaj Amrutur,et al.  Fast low-power decoders for RAMs , 2001, IEEE J. Solid State Circuits.

[25]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[26]  Cameron McNairy,et al.  Itanium 2 Processor Microarchitecture , 2003, IEEE Micro.