Exploring Salvage Techniques for Multi-core Architectures

As process technology scales, both fabrication induced and in-operation hard faults will become more prevalent, li miting yield and effective product lifetime. The simultaneo us emergence of chip multiprocessors (CMPs) and revitalizati on of machine virtualization offers several opportunities fo r hard failure tolerance. In this paper, we provide preliminary an alysis of methods for lifetime recoverability on CMPs which contain partially functional execution cores. Specificall y, we examine how processor virtualization can help a CMP architecture architecture overcome faults by migrating computa tion or virtualizing functionality which cannot be support ed by the hardware as a result of failure.

[1]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[2]  Manoj Franklin,et al.  Balancing thoughput and fairness in SMT processors , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[3]  James E. Smith,et al.  Saving and Restoring Implementation Contexts with co-Designed Virtual Machines , 2001 .

[4]  Irith Pomeranz,et al.  Transient-fault recovery using simultaneous multithreading , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[5]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[6]  A.S. Dhodapkar,et al.  Dynamic microarchitecture adaptation via co-designed virtual machines , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).

[7]  Dirk Grunwald,et al.  Microarchitectural denial of service: insuring microarchitectural fairness , 2002, MICRO.

[8]  Irith Pomeranz,et al.  Transient-fault recovery for chip multiprocessors , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[9]  Nathan L. Binkert,et al.  Network-Oriented Full-System Simulation using M5 , 2003 .

[10]  Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.

[11]  Sule Ozev,et al.  Tolerating hard faults in microprocessor array structures , 2004, International Conference on Dependable Systems and Networks, 2004.

[12]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[13]  Pradip Bose,et al.  Exploiting structural duplication for lifetime reliability enhancement , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[14]  Sule Ozev,et al.  A mechanism for online diagnosis of hard faults in microprocessors , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[15]  T. N. Vijaykumar,et al.  Rescue: a microarchitecture for testability and defect tolerance , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[16]  James E. Smith,et al.  Virtual machines - versatile platforms for systems and processes , 2005 .