Survey of fault tolerance techniques for shared memory multicore/multiprocessor systems
暂无分享,去创建一个
[1] John P. Hayes,et al. Online BIST for Embedded Systems , 1998, IEEE Des. Test Comput..
[2] Janak H. Patel,et al. Reliability of scrubbing recovery-techniques for memory systems , 1990 .
[3] Satish Narayanasamy,et al. Respec: efficient online multiprocessor replayvia speculation and external determinism , 2010, ASPLOS XV.
[4] Renato J. O. Figueiredo,et al. Towards Byzantine Fault Tolerance in Many-Core Computing Platforms , 2007, 13th Pacific Rim International Symposium on Dependable Computing (PRDC 2007).
[5] Andrew Warfield,et al. Safe Hardware Access with the Xen Virtual Machine Monitor , 2007 .
[6] Derek Hower,et al. Rerun: Exploiting Episodes for Lightweight Memory Race Recording , 2008, 2008 International Symposium on Computer Architecture.
[7] Hongyu Sun,et al. A SURVEY OF SOFTWARE FAULT TOLERANCE TECHNIQUES , 2005 .
[8] Andrea Miele,et al. A software framework for dynamic self-repair in embedded SoCs exploiting reconfigurable devices , 2010, 2010 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR).
[9] Josep Torrellas,et al. Capo: a software-hardware interface for practical deterministic multiprocessor replay , 2009, ASPLOS.
[10] Mark D. Hill,et al. Karma: scalable deterministic record-replay , 2011, ICS '11.
[11] David A. Wood,et al. Calvin: Deterministic or not? Free will to choose , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[12] K. Hiraki,et al. Heterogeneous Functional Units for High Speed Fault-Tolerant Execution Stage , 2007 .
[13] Christian Engelmann,et al. A Framework for Proactive Fault Tolerance , 2008, 2008 Third International Conference on Availability, Reliability and Security.
[14] David I. August,et al. SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.
[15] Sarita V. Adve,et al. mSWAT: Low-cost hardware fault detection and diagnosis for multicore systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[16] Marek Olszewski,et al. Kendo: efficient deterministic multithreading in software , 2009, ASPLOS.
[17] Jeffrey F. Naughton,et al. Real-time, concurrent checkpoint for parallel programs , 1990, PPOPP '90.
[18] William Thies,et al. StreamIt: A Language for Streaming Applications , 2002, CC.
[19] Anoop Gupta,et al. Hive: fault containment for shared-memory multiprocessors , 1995, SOSP.
[20] Bin Jiang,et al. Hierarchical run time deadlock detection in process networks , 2008, 2008 IEEE Workshop on Signal Processing Systems.
[21] Nicolas Ventroux,et al. Analysis of on-line self-testing policies for real-time embedded multiprocessors in DSM technologies , 2010, 2010 IEEE 16th International On-Line Testing Symposium.
[22] Fred B. Schneider,et al. Hypervisor-based fault tolerance , 1996, TOCS.
[23] Stephen A. Edwards,et al. SHIM: a deterministic model for heterogeneous embedded systems , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[24] Emery D. Berger,et al. Dthreads: efficient deterministic multithreading , 2011, SOSP.
[25] Yennun Huang,et al. Software rejuvenation: analysis, module and applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[26] Brian Randell,et al. System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.
[27] Ravishankar K. Iyer,et al. Active replication of multithreaded applications , 2006, IEEE Transactions on Parallel and Distributed Systems.
[28] Edward J. McCluskey,et al. Concurrent Error Detection Using Watchdog Processors - A Survey , 1988, IEEE Trans. Computers.
[29] Jonathan M. Smith,et al. A Survey of Software Fault Tolerance Techniques , 1988 .
[30] Scott Shenker,et al. Diverse Replication for Single-Machine Byzantine-Fault Tolerance , 2008, USENIX Annual Technical Conference.
[31] Dan Grossman,et al. CoreDet: a compiler and runtime system for deterministic multithreaded execution , 2010, ASPLOS XV.
[32] Sandip Kundu,et al. BIST to Detect and Characterize Transient and Parametric Failures , 2010, IEEE Design & Test of Computers.
[33] L. Alvisi,et al. A Survey of Rollback-Recovery Protocols , 2002 .
[34] Yale N. Patt,et al. Checkpoint Repair for High-Performance Out-of-Order Execution Machines , 1987, IEEE Transactions on Computers.
[35] R. Baumann. Soft errors in advanced semiconductor devices-part I: the three radiation sources , 2001 .
[36] Carl E. Landwehr,et al. Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.
[37] Jeffrey F. Naughton,et al. Low-Latency, Concurrent Checkpointing for Parallel Programs , 1994, IEEE Trans. Parallel Distributed Syst..
[38] Tipp Moseley,et al. PLR: A Software Approach to Transient Fault Tolerance for Multicore Architectures , 2009, IEEE Transactions on Dependable and Secure Computing.
[39] Karthikeyan Sankaralingam,et al. Sampling + DMR: Practical and low-overhead permanent fault detection , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[40] Yutaka Ishikawa,et al. A New Concurrent Checkpoint Mechanism for Real-Time and Interactive Processes , 2010, 2010 IEEE 34th Annual Computer Software and Applications Conference.
[41] Dimitris Gizopoulos,et al. Software-based self-testing of embedded processors , 2005, IEEE Transactions on Computers.
[42] James E. Smith,et al. Configurable isolation: building high availability systems with commodity multi-core processors , 2007, ISCA '07.
[43] Stefan Götz,et al. Unmodified Device Driver Reuse and Improved System Dependability via Virtual Machines , 2004, OSDI.