System Adaptivity and Fault-Tolerance in NoC-based MPSoCs: The MADNESS Project Approach

Modern embedded systems increasingly require adaptive run-time management. The system may adapt the mapping of the applications in order to accommodate the current workload conditions, to balance load for efficient resource utilization, to meet quality of service agreements, to avoid thermal hot-spots and to reduce power consumption. As the possibility of experiencing run-time faults becomes increasingly relevant with deep-sub-micron technology nodes, in the scope of the MADNESS project, we focus particularly on the problem of graceful degradation by dynamic remapping in presence of run-time faults. In this paper, we summarize the major results achieved in the MADNESS project until now regarding the system adaptivity and fault tolerant processing. We report the first results of the integration between platform level and middleware level support for adaptivity and fault tolerance. A case study demonstrates the survival ability of the system via a low-overhead process migration mechanism and a near-optimal online remapping heuristic.

[1]  Andrea Acquaviva,et al.  Assessing Task Migration Impact on Embedded Soft Real-Time Streaming Multimedia Applications , 2008, EURASIP J. Embed. Syst..

[2]  Onur Derin,et al.  Online task remapping strategies for fault-tolerant Network-on-Chip multiprocessors , 2011, Proceedings of the Fifth ACM/IEEE International Symposium.

[3]  Davide Bertozzi,et al.  Supporting Task Migration in Multi-Processor Systems-on-Chip: A Feasibility Study , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[4]  Diederik Verkest,et al.  A Safari Through the MPSoC Run-Time Management Jungle , 2010, J. Signal Process. Syst..

[5]  Srivaths Ravi,et al.  Systematic Software-Based Self-Test for Pipelined Processors , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6]  Michel Robert,et al.  An Adaptive Message Passing MPSoC Framework , 2009, Int. J. Reconfigurable Comput..

[7]  Hokeun Kim,et al.  A task remapping technique for reliable multi-core embedded systems , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[8]  Radu Marculescu,et al.  FARM: Fault-aware resource management in NoC-based multiprocessor platforms , 2011, 2011 Design, Automation & Test in Europe.

[9]  Paolo Meloni,et al.  Towards an ESL design framework for adaptive and fault-tolerant MPSoCs: MADNESS or not? , 2011, 2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia.

[10]  Paolo Meloni,et al.  Adaptivity Support for MPSoCs Based on Process Migration in Polyhedral Process Networks , 2012, VLSI Design.

[11]  Todor Stefanov,et al.  pn: A Tool for Improved Derivation of Process Networks , 2007, EURASIP J. Embed. Syst..

[12]  Onur Derin,et al.  A Middleware Approach to Achieving Fault Tolerance of Kahn Process Networks on Networks on Chips , 2011, Int. J. Reconfigurable Comput..

[13]  Luca Benini,et al.  Xpipes: A latency insensitive parameterized network-on-chip architecture for multi-processor SoCs , 2003, 2012 IEEE 30th International Conference on Computer Design (ICCD).