Encore: Low-cost, fine-grained transient fault recovery
暂无分享,去创建一个
[1] Rakesh Kumar,et al. A numerical optimization-based methodology for application robustification: Transforming applications for error tolerance , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).
[2] Karthikeyan Sankaralingam,et al. Relax: an architectural framework for software recovery of hardware faults , 2010, ISCA.
[3] Douglas L. Jones,et al. Stochastic computation , 2010, Design Automation Conference.
[4] Amin Ansari,et al. Shoestring: probabilistic soft error reliability on the cheap , 2010, ASPLOS XV.
[5] David Blaauw,et al. Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits , 2010, Proceedings of the IEEE.
[6] Peter H. Beckman,et al. Analyzing Checkpointing Trends for Applications on the IBM Blue Gene/P System , 2009, 2009 International Conference on Parallel Processing Workshops.
[7] David R. Kaeli,et al. Eliminating microarchitectural dependency from Architectural Vulnerability , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[8] Sarita V. Adve,et al. Using likely program invariants to detect hardware errors , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).
[9] Sarita V. Adve,et al. Understanding the propagation of hard errors to software and implications for resilient system design , 2008, ASPLOS.
[10] Albert Meixner,et al. Argus: Low-Cost, Comprehensive Error Detection in Simple Cores , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[11] Ravishankar K. Iyer,et al. Processor-Level Selective Replication , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).
[12] Cheng Wang,et al. Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection , 2007, International Symposium on Code Generation and Optimization (CGO'07).
[13] Shubhendu S. Mukherjee,et al. Perturbation-based Fault Screening , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[14] Donald Yeung,et al. Application-Level Correctness and its Impact on Fault Tolerance , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[15] Babak Falsafi,et al. Reunion: Complexity-Effective Multicore Redundancy , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[16] Eric Rotenberg,et al. Understanding prediction-based partial redundant threading for low-overhead, high- coverage fault tolerance , 2006, ASPLOS XII.
[17] Shuguang Feng,et al. Cost-efficient soft error protection for embedded microprocessors , 2006, CASES '06.
[18] Sanjay J. Patel,et al. ReStore: symptom based soft error detection in microprocessors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).
[19] David García,et al. NonStop/spl reg/ advanced architecture , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).
[20] J. Duell. The design and implementation of Berkeley Lab's linux checkpoint/restart , 2005 .
[21] David I. August,et al. SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.
[22] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[23] Joel Emer,et al. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[24] J. Torrellas,et al. ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[25] Shubhendu S. Mukherjee,et al. Detailed design and evaluation of redundant multi-threading alternatives , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[26] Milo M. K. Martin,et al. SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[27] Jason Duell,et al. Requirements for Linux Checkpoint/Restart , 2002 .
[28] Babak Falsafi,et al. Reference idempotency analysis: a framework for optimizing speculative execution , 2001, PPoPP '01.
[29] Shubhendu S. Mukherjee,et al. Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[30] T. A. Gregg,et al. IBM S/390 Parallel Enterprise Server G5 fault tolerance: A historical perspective , 1999, IBM Journal of Research and Development.
[31] Eric Rotenberg,et al. AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).
[32] R. Ernst,et al. Embedded program timing analysis based on path clustering and architecture classification , 1997, 1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD).
[33] Wen-mei W. Hwu,et al. A software based approach to achieving optimal performance for signature control flow checking , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.
[34] Ravishankar K. Iyer,et al. Automated Derivation of Application-Aware Error Detectors Using Static Analysis: The Trusted Illiac Approach , 2011, IEEE Transactions on Dependable and Secure Computing.
[35] Norman P. Jouppi,et al. A Case Study of Incremental and Background Hybrid In-Memory Checkpointing , 2010 .
[36] Albert Meixner,et al. A: L-C, C E D S C , 2008 .