Mitigating lower layer failures with adaptive system reconfiguration

Future terascale systems based on sub-22nm technologies will show significant variability and reliability challenges from the transistor to the circuit level. On this upcoming scenario, a reliable system must be built on top of unreliable components, which will degrade and even fail during the normal lifetime of the chip. To achieve this target, we present a high-level reconfiguration approach for future heterogeneous systems that mitigates the possible lower layer shortcomings and adapts the processor to the user's requirements.

[1]  José González,et al.  Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors , 2010, ISCA.

[2]  Todd M. Austin,et al.  A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor , 2003, MICRO.

[3]  Diana Marculescu,et al.  Variation-aware dynamic voltage/frequency scaling , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[4]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[5]  Margaret Martonosi,et al.  Techniques for Multicore Thermal Management: Classification and New Exploration , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[6]  Josep Torrellas,et al.  Facelift: Hiding and slowing down aging in multicores , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[7]  Margaret Martonosi,et al.  Dynamic thermal management for high-performance microprocessors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[8]  Margaret Martonosi,et al.  An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[9]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[10]  Kai Ma,et al.  Temperature-constrained power control for chip multiprocessors with online model estimation , 2009, ISCA '09.

[11]  Shubhendu S. Mukherjee,et al.  APast Future Time Quantized AVF : A Means of Capturing Vulnerability Variations over Small Windows of Time , 2009 .

[12]  Michael F. P. O'Boyle,et al.  A Predictive Model for Dynamic Microarchitectural Adaptivity Control , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[13]  Donald Yeung,et al.  Learning-Based SMT Processor Resource Distribution via Hill-Climbing , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[14]  Shantanu Gupta,et al.  Architectural core salvaging in a multi-core processor for hard-error tolerance , 2009, ISCA '09.

[15]  Derek Chiou Extending the reach of microprocessors: column and curious caching , 1999 .

[16]  Shubhendu S. Mukherjee,et al.  Detailed design and evaluation of redundant multi-threading alternatives , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[17]  Anand Sivasubramaniam,et al.  Managing server energy and operational costs in hosting centers , 2005, SIGMETRICS '05.

[18]  Wei Liu,et al.  Using Register Lifetime Predictions to Protect Register Files Against Soft Errors , 2008 .

[19]  Krishna K. Rangan,et al.  Achieving uniform performance and maximizing throughput in the presence of heterogeneity , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[20]  Margaret Martonosi,et al.  Formal online methods for voltage/frequency control in multiple clock domain microprocessors , 2004, ASPLOS XI.

[21]  Shuai Wang,et al.  Self-Adaptive Data Caches for Soft-Error Reliability , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[22]  Engin Ipek,et al.  Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[23]  Vanish Talwar,et al.  No "power" struggles: coordinated multi-level power management for the data center , 2008, ASPLOS.

[24]  Josep Torrellas,et al.  Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors , 2008, 2008 International Symposium on Computer Architecture.