Low latency reconfiguration mechanism for fine-grained processor internal functional units

The strive for performance, low power consumption, and less chip area have been diminishing the reliability and the time to fault occurrences due to wear out of electronic devices. Recent research has shown that functional units within processors usually execute a different amount of operations when running programs. Therefore, these units present different individual wear out during their lifetime. Most existent schemes for reconfiguration of processors due to fault detection and other processor parameters are done at the level of cores which is a costly way to achieve redundancy. This paper presents a low latency (approximately 1 clock cycle) software controlled mechanism to reconfigure units within processor cores according to predefined parameters. Such reconfiguration capability delivers features like wear out balance of processor functional units, configuration of units according to the criticality of tasks running on an operating system and configurations to gain in performance (e.g. parallel execution) when possible. The focus of this paper is to show the implemented low latency reconfiguration mechanism and highlight its possible main features.

[1]  Herbert Bos,et al.  MINIX 3: a highly reliable, self-repairing operating system , 2006, OPSR.

[2]  Sergei Devadze,et al.  Health Management for Self-Aware SoCs Based on IEEE 1687 Infrastructure , 2017, IEEE Design & Test.

[3]  Andreas Steininger,et al.  Software Composability and Mixed Criticality for Triple Modular Redundant Architectures , 2013, SASSUR@SAFECOMP.

[4]  Mario Scholzel,et al.  A fault tolerant dynamically scheduled processor with partial permanent fault handling , 2018 .

[5]  Felix Mühlbauer,et al.  On hardware-based fault-handling in dynamically scheduled processors , 2017, 2017 IEEE 20th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS).

[6]  Peter Y. K. Cheung,et al.  Improving FPGA Reliability with Wear-Levelling , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.

[7]  Miodrag Potkonjak,et al.  Low overhead fault-tolerant FPGA systems , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[8]  Elaheh Bozorgzadeh,et al.  Aging-aware high-level physical planning for reconfigurable systems , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[9]  Peter Y. K. Cheung,et al.  Degradation Analysis and Mitigation in FPGAs , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[10]  Edward J. McCluskey,et al.  Reconfigurable architecture for autonomous self-repair , 2004, IEEE Design & Test of Computers.

[11]  Chrysostomos Nicopoulos,et al.  DaemonGuard: Enabling O/S-Orchestrated Fine-Grained Software-Based Selective-Testing in Multi-/Many-Core Microprocessors , 2016, IEEE Transactions on Computers.

[12]  Jörg Henkel,et al.  Module diversification: Fault tolerance and aging mitigation for runtime reconfigurable architectures , 2013, 2013 IEEE International Test Conference (ITC).

[13]  Abdulazim Amouri,et al.  Altering LUT configuration for wear-out mitigation of FPGA-mapped designs , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[14]  Narayanan Vijaykrishnan,et al.  Toward Increasing FPGA Lifetime , 2008, IEEE Transactions on Dependable and Secure Computing.

[15]  N. Seifert,et al.  Robust system design with built-in soft-error resilience , 2005, Computer.

[16]  Ming Zhang,et al.  Soft Error Resilient System Design through Error Correction , 2006, VLSI-SoC.

[17]  Edward J. McCluskey,et al.  Column-Based Precompiled Configuration Techniques for FPGA , 2001, The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01).

[18]  David Blaauw,et al.  Bubble Razor: Eliminating Timing Margins in an ARM Cortex-M3 Processor in 45 nm CMOS Using Architecturally Independent Error Detection and Correction , 2013, IEEE Journal of Solid-State Circuits.

[19]  Cristiana Bolchini,et al.  A dynamic reliability management framework for heterogeneous multicore systems , 2017, 2017 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT).

[20]  Shidhartha Das,et al.  A Triple Core Lock-Step (TCLS) ARM® Cortex®-R5 Processor for Safety-Critical and Ultra-Reliable Applications , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop (DSN-W).

[21]  Hiroyuki Kawai,et al.  Tile-Based Fault Tolerant Approach Using Partial Reconfiguration , 2009, ARC.

[22]  Luca Benini,et al.  WARM: Workload-Aware Reliability Management in Linux/Android , 2017, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[23]  Christian Dietrich,et al.  dOSEK: the design and implementation of a dependability-oriented static embedded kernel , 2015, 21st IEEE Real-Time and Embedded Technology and Applications Symposium.