Architectures for online error detection and recovery in multicore processors
暂无分享,去创建一个
Sarita V. Adve | Mihalis Psarakis | Dimitris Gizopoulos | A. Biswas | Albert Meixner | Daniel J. Sorin | Xavier Vera | Pradeep Ramachandran | Siva Kumar Sastry Hari | X. Vera | Arijit Biswas | S. Adve | D. Sorin | M. Psarakis | D. Gizopoulos | S. Hari | Pradeep Ramachandran | A. Meixner
[1] Albert Meixner,et al. Dynamic Verification of Memory Consistency in Cache-Coherent Multithreaded Computer Architectures , 2006, International Conference on Dependable Systems and Networks (DSN'06).
[2] Sule Ozev,et al. Tolerating hard faults in microprocessor array structures , 2004, International Conference on Dependable Systems and Networks, 2004.
[3] David I. August,et al. SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.
[4] Valeria Bertacco,et al. Dacota: Post-silicon validation of the memory subsystem in multi-core designs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[5] José Duato,et al. A Low Overhead Fault Tolerant Coherence Protocol for CMP Architectures , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[6] Mihalis Psarakis,et al. MT-SBST: Self-test optimization in multithreaded multicore architectures , 2010, 2010 IEEE International Test Conference.
[7] Janusz Rajski,et al. Logic BIST for large industrial designs: real issues and case studies , 1999, International Test Conference 1999. Proceedings (IEEE Cat. No.99CH37034).
[8] Babak Falsafi,et al. Reunion: Complexity-Effective Multicore Redundancy , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[9] Amin Ansari,et al. The StageNet fabric for constructing resilient multicore systems , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[10] Edward J. McCluskey,et al. Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..
[11] Albert Meixner,et al. Detouring: Translating software to circumvent hard faults in simple cores , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).
[12] Sarita V. Adve,et al. Trace-based microarchitecture-level diagnosis of permanent hardware faults , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).
[13] Irith Pomeranz,et al. Transient-fault recovery for chip multiprocessors , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..
[14] Shantanu Gupta,et al. Architectural core salvaging in a multi-core processor for hard-error tolerance , 2009, ISCA '09.
[15] Onur Mutlu,et al. Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[16] Shekhar Y. Borkar,et al. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.
[17] Albert Meixner,et al. Argus: Low-Cost, Comprehensive Error Detection in Simple Cores , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[18] Shubhendu S. Mukherjee,et al. Perturbation-based Fault Screening , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[19] Daniel J. Sorin,et al. Fault Tolerant Computer Architecture , 2009, Fault Tolerant Computer Architecture.
[20] Jacques Henri Collet,et al. Chip Self-Organization and Fault Tolerance in Massively Defective Multicore Arrays , 2011, IEEE Transactions on Dependable and Secure Computing.
[21] Shubhendu S. Mukherjee,et al. Detailed design and evaluation of redundant multithreading alternatives , 2002, ISCA.
[22] Milo M. K. Martin,et al. SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[23] Albert Meixner,et al. Error Detection via Online Checking of Cache Coherence with Token Coherence Signatures , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[24] Sarita V. Adve,et al. Using likely program invariants to detect hardware errors , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).
[25] Josep Torrellas,et al. ReViveI/O: efficient handling of I/O in highly-available rollback-recovery servers , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..
[26] Todd M. Austin,et al. A fault tolerant approach to microprocessor design , 2001, 2001 International Conference on Dependable Systems and Networks.
[27] Matteo Sonza Reorda,et al. Microprocessor Software-Based Self-Testing , 2010, IEEE Design & Test of Computers.
[28] Daniel J. Sorin,et al. Core Cannibalization Architecture: Improving lifetime chip performance for multicore processors in the presence of hard faults , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[29] Sanjay J. Patel,et al. ReStore: symptom based soft error detection in microprocessors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).
[30] T. N. Vijaykumar,et al. Rescue: a microarchitecture for testability and defect tolerance , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[31] Sarita V. Adve,et al. Understanding the propagation of hard errors to software and implications for resilient system design , 2008, ASPLOS.
[32] Engin Ipek,et al. Utilizing Dynamically Coupled Cores to Form a Resilient Chip Multiprocessor , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).
[33] Mihalis Psarakis,et al. Software-Based Self-Testing of Symmetric Shared-Memory Multiprocessors , 2009, IEEE Transactions on Computers.
[34] Gary S. Tyson,et al. Guaranteeing Hits to Improve the Efficiency of a Small Instruction Cache , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[35] Todd M. Austin,et al. Ultra low-cost defect protection for microprocessor pipelines , 2006, ASPLOS XII.
[36] Dimitris Gizopoulos,et al. Guest Editors' Introduction: Special Section on Dependable Computer Architecture , 2011, IEEE Trans. Computers.
[37] Todd M. Austin,et al. DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[38] S. Adve,et al. LOW-COST HARDWARE FAULT DETECTION AND DIAGNOSIS FOR MULTICORE SYSTEMS RUNNING MULTITHREADED WORKLOADS , 2022 .
[39] Josep Torrellas,et al. ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors , 2002, ISCA.
[40] Doug Burger,et al. Exploiting microarchitectural redundancy for defect tolerance , 2003, 2012 IEEE 30th International Conference on Computer Design (ICCD).
[41] James E. Smith,et al. Configurable isolation: building high availability systems with commodity multi-core processors , 2007, ISCA '07.
[42] Sharad Malik,et al. Runtime validation of memory ordering using constraint graph checking , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[43] Pradip Bose,et al. Exploiting structural duplication for lifetime reliability enhancement , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).