论文信息 - Perturbation-based Fault Screening

Perturbation-based Fault Screening

Fault screeners are a new breed of fault identification technique that can probabilistically detect if a transient fault has affected the state of a processor. We demonstrate that fault screeners function because of two key characteristics. First, we show that much of the intermediate data generated by a program inherently falls within certain consistent bounds. Second, we observe that these bounds are often violated by the introduction of a fault. Thus, fault screeners can identify faults by directly watching for any data inconsistencies arising in an application's behavior. We present an idealized algorithm capable of identifying over 85% of injected faults on the SpecInt suite and over 75% overall. Further, in a realistic implementation on a simulated Pentium-III-like processor, about half of the errors due to injected faults are identified while still in speculative state. Errors detected this early can be eliminated by a pipeline flush. In this paper, we present several hardware-based versions of this screening algorithm and show that flushing the pipeline every time the hardware screener triggers reduces overall performance by less than 1%

[1] Ravishankar K. Iyer,et al. A Processor-Level Framework for High-Performance and High-Dependability , 2001 .

[2] Michael Burrows,et al. Eraser: a dynamic data race detector for multithreaded programs , 1997, TOCS.

[3] Mikko H. Lipasti,et al. Value locality and load value prediction , 1996, ASPLOS VII.

[4] David I. August,et al. Design and evaluation of hybrid fault-detection systems , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[5] Martin Roesch,et al. Snort - Lightweight Intrusion Detection for Networks , 1999 .

[6] Sanjay J. Patel,et al. Y-branches: when you come to a fork in the road, take it , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[7] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[8] Onur Mutlu,et al. Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[9] Burton H. Bloom,et al. Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[10] Michael Schatz,et al. Learning Program Behavior Profiles for Intrusion Detection , 1999, Workshop on Intrusion Detection and Network Monitoring.

[11] K. Soumyanath,et al. Scaling trends of cosmic ray induced soft errors in static latches beyond 0.18 /spl mu/ , 2001, 2001 Symposium on VLSI Circuits. Digest of Technical Papers (IEEE Cat. No.01CH37185).

[12] Sudheendra Hangal,et al. Tracking down software bugs using automatic anomaly detection , 2002, ICSE '02.

[13] Joel S. Emer,et al. Techniques to reduce the soft error rate of a high-performance microprocessor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[14] Dawson R. Engler,et al. Bugs as deviant behavior: a general approach to inferring errors in systems code , 2001, SOSP.

[15] Sanjay J. Patel,et al. ReStore: symptom based soft error detection in microprocessors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[16] Shubhendu S. Mukherjee,et al. Detailed design and evaluation of redundant multithreading alternatives , 2002, ISCA.

[17] Anup K. Ghosh,et al. Using Program Behavior Pro � les for Intrusion Detection , 1999 .

[18] Dorothy E. Denning,et al. An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.