Online fault detection in a hardware/software co-design environment: system partitioning

System reliability aspects are receiving a lot of attention in the design of systems for critical application fields. Often these issues are approached at low abstraction levels, toward the end of the design process, introducing significant overheads. By introducing fault detection requirements at system level, when a hardware/software co-design process is to be carried out, it is possible to evaluate the overheads and benefits of different solutions. The traditional partitioning phase has been modified in order to take into account the reliability issues for selecting, among the several identified reliable solutions, the one that best responds to the user's requirements. The paper presents the partitioning for a co-design flow aimed at providing fault detection properties to the final system, selecting the hardware tasks and the software tasks for implementing both the system functionality and the checking capabilities.

[1]  Dhiraj K. Pradhan,et al.  Fault-tolerant computing : theory and techniques , 1986 .

[2]  Jacob A. Abraham,et al.  Evaluation of integrated system-level checks for on-line error detection , 1996, Proceedings of IEEE International Computer Performance and Dependability Symposium.

[3]  Manuel Blum,et al.  Reflections on the Pentium Bug , 1996, IEEE Trans. Computers.

[4]  Robert S. Swarz,et al.  The theory and practice of reliable system design , 1982 .

[5]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[6]  Alex Orailoglu,et al.  High-level synthesis of gracefully degradable ASICs , 1996, Proceedings ED&TC European Design and Test Conference.

[7]  Peter Marwedel,et al.  Hardware/software partitioning using integer programming , 1996, Proceedings ED&TC European Design and Test Conference.

[8]  D. C. Shreve,et al.  Real-time checkers: built-in-test for mission-critical software , 1997, 16th DASC. AIAA/IEEE Digital Avionics Systems Conference. Reflections to the Future. Proceedings.

[9]  Donatella Sciuto,et al.  Designing Reliable Embedded Systems Based on 32 Bit Microprocessors , 2001, IOLTW.

[10]  Kewal K. Saluja,et al.  Fault tolerance through re-execution in multiscalar architecture , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[11]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[12]  D. Sciuto,et al.  System-level performance estimation strategy for sw and hw , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).

[13]  Edward A. Lee,et al.  A global criticality/local phase driven algorithm for the constrained hardware/software partitioning problem , 1994, CODES.

[14]  Manuel Blum,et al.  Software reliability via run-time result-checking , 1997, JACM.

[15]  Suku Nair,et al.  Design and Evaluation of System-Level Checks for On-Line Control Flow Error Detection , 1999, IEEE Trans. Parallel Distributed Syst..