A task remapping technique for reliable multi-core embedded systems

With the continuous scaling of semiconductor technology, the life-time of circuit is decreasing so that processor failure becomes an important issue in MPSoC design. A software solution to tolerate run-time processor failure is to migrate tasks from the failed processors to the live processors when failure occurs. Previous works on run-time task migration usually aim to minimize the migration overhead with or without a given latency constraint. For streaming applications, however, it is more important to minimize the throughput degradation than the migration overhead or the latency. Hence, we propose a task remapping technique to minimize the throughput degradation assuming that the migration overhead can be amortized safely. The target multi-core system assumed in this paper consists of processor pools and each pool consists of homogeneous processors. The proposed technique is based on an intensive compile-time analysis for all possible failure scenarios. It involves the following steps; 1) Determine the static mapping of tasks onto the live processors, aiming to minimize the throughput degradation: 2) Find an optimal processor-to-processor mapping to minimize the task migration overhead: and 3) Store the resultant task remapping information that includes task mapping and processor-to-processor mapping results. Since the task remapping information is pre-computed at compile-time for all possible failure scenarios, it should be efficiently represented and stored. At run-time, we simply remap the tasks following the compile-time decision. We examine the scalability of the proposed technique on both space and run-time overhead for compile-time analysis varying the number of failed processors. Through intensive experiments, we show that the proposed technique outperforms the previous works with respect to application throughput.

[1]  Qiang Xu,et al.  Lifetime reliability-aware task allocation and scheduling for MPSoC platforms , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[2]  Tajana Simunic,et al.  Temperature Aware Task Scheduling in MPSoCs , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[3]  Rainer Leupers,et al.  SHAPES:: a tiled scalable software hardware architecture platform for embedded systems , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).

[4]  Johnny S. Wong,et al.  Efficient Task Migration Algorithm for Distributed Systems , 1992, IEEE Trans. Parallel Distributed Syst..

[5]  Israel Koren,et al.  Fault-Tolerant Systems , 2007 .

[6]  Atakan Dogan,et al.  Matching and Scheduling Algorithms for Minimizing Execution Time and Failure Probability of Applications in Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[7]  Li Shang,et al.  Reliable multiprocessor system-on-chip synthesis , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[8]  Soonhoi Ha,et al.  Pipelined data parallel task mapping/scheduling technique for MPSoC , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[9]  Christian Haubelt,et al.  Dynamic task binding for hardware/software reconfigurable networks , 2006, SBCCI '06.

[10]  Alex Orailoglu,et al.  Towards no-cost adaptive MPSoC static schedules through exploitation of logical-to-physical core mapping latitude , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[11]  Davide Bertozzi,et al.  Supporting Task Migration in Multi-Processor Systems-on-Chip: A Feasibility Study , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[12]  C. Siva Ram Murthy,et al.  A Fault-Tolerant Dynamic Scheduling Algorithm for Multiprocessor Real-Time Systems and Its Analysis , 1998, IEEE Trans. Parallel Distributed Syst..

[13]  Diederik Verkest,et al.  Low cost task migration initiation in a heterogeneous MP-SoC , 2005, Design, Automation and Test in Europe.

[14]  Erol Gelenbe,et al.  Failure detection algorithms for a reliable execution of parallel programs , 1995, Proceedings. 14th Symposium on Reliable Distributed Systems.

[15]  Rami G. Melhem,et al.  Loop Transformations for Fault Detection in Regular Loops on Massively Parallel Systems , 1996, IEEE Trans. Parallel Distributed Syst..

[16]  Michel Robert,et al.  An Adaptive Message Passing MPSoC Framework , 2009, Int. J. Reconfigurable Comput..

[17]  José A. B. Fortes,et al.  The Full-Use-of-Suitable-Spares (FUSS) Approach to Hardware Reconfiguration for Fault-Tolerant Processor Arrays , 1990, IEEE Trans. Computers.

[18]  William J. B. Oldham,et al.  Dynamic Task Allocation Models for Large Distributed Computing Systems , 1995, IEEE Trans. Parallel Distributed Syst..

[19]  Alex Orailoglu,et al.  Predictable execution adaptivity through embedding dynamic reconfigurability into static MPSoC schedules , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[20]  Wayne H. Wolf,et al.  TGFF: task graphs for free , 1998, Proceedings of the Sixth International Workshop on Hardware/Software Codesign. (CODES/CASHE'98).

[21]  J.-P. Wang,et al.  Task Allocation for Maximizing Reliability of Distributed Computer Systems , 1992, IEEE Trans. Computers.

[22]  Tajana Simunic,et al.  Static and Dynamic Temperature-Aware Scheduling for Multiprocessor SoCs , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[23]  Davide Bertozzi,et al.  Supporting Task Migration in MPSoCs: A Feasibility Study , 2006 .

[24]  Jun Gu,et al.  FAST: a low-complexity algorithm for efficient scheduling of DAGs on parallel processors , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.