Reliability of fault-tolerant systems with parallel task processing

Abstract The paper considers performance and reliability of fault-tolerant software running on a hardware system that consists of multiple processing units. The software consists of functionally equivalent but independently developed versions that start execution simultaneously. The computational complexity and reliability of different versions are different. The system completes the task execution when the outputs of a pre-specified number of versions coincide. The processing units are characterized by different availability and processing speed. It is assumed that they are able to share the computational burden perfectly and that execution of each version can be fully parallelized. The algorithm based on the universal generating function technique is used for determining the distribution of system task execution time. This algorithm allows analysts to evaluate complex hardware–software reliability and performance indices such as expected task execution time and probability that the task is completed within a given time. Illustrative examples are also presented.

[1]  Minsu Choi,et al.  Modeling and analysis of fault tolerant multistage interconnection networks , 2003, IEEE Trans. Instrum. Meas..

[2]  Juan A. Carrasco Computationally Efficient and Numerically Stable Reliability Bounds for Repairable Fault-Tolerant Systems , 2002, IEEE Trans. Computers.

[3]  Gregory Levitin,et al.  Optimal Version sequencing in Fault-Tolerant Programs , 2005, Asia Pac. J. Oper. Res..

[4]  Gregory Levitin Reliability and performance analysis for fault-tolerant programs consisting of versions with different characteristics , 2004, Reliab. Eng. Syst. Saf..

[5]  Piotr Jedrzejowicz,et al.  Artifical Neural Network for Multiprocessor Tasks Scheduling , 2000, Intelligent Information Systems.

[6]  Dimitri Kececioglu,et al.  Reliability engineering handbook , 1991 .

[7]  Katerina Goseva-Popstojanova,et al.  Performability modeling of N version programming technique , 1995, Proceedings of Sixth International Symposium on Software Reliability Engineering. ISSRE'95.

[8]  Oded Berman,et al.  Optimization models for recovery block schemes , 1999, Eur. J. Oper. Res..

[9]  Xiaolin Teng,et al.  A software-reliability growth model for N-version programming systems , 2002, IEEE Trans. Reliab..

[10]  Algirdas Avizienis,et al.  Software Fault Tolerance , 1989, IFIP Congress.

[11]  Liming Chen,et al.  N-VERSION PROGRAMMINC: A FAULT-TOLERANCE APPROACH TO RELlABlLlTY OF SOFTWARE OPERATlON , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[12]  Alan A. Bertossi,et al.  Rate-monotonic scheduling for hard-real-time systems , 1997 .

[13]  Ann T. Tai,et al.  Performability enhancement of fault-tolerant software , 1993 .

[14]  John F. Meyer,et al.  On Evaluating the Performability of Degradable Computing Systems , 1980, IEEE Transactions on Computers.

[15]  Gregory Levitin,et al.  The Universal Generating Function in Reliability Analysis and Optimization , 2005 .

[16]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[17]  Bev Littlewood,et al.  Assessing the reliability of diverse fault-tolerant software-based systems , 2002 .

[18]  Liang Fang,et al.  A cost effective fault-tolerant scheme for RAIDs , 2008, Journal of Computer Science and Technology.

[19]  Xiaolin Teng,et al.  Software Fault Tolerance , 2003 .

[20]  Giuseppe Iazeolla,et al.  Performability evaluation of multicomponent fault-tolerant systems , 1988 .

[21]  Piotr Jedrzejowicz,et al.  Scheduling Fault-Tolerant Programs on Multiple Processors to Maximize Schedule Reliability , 1999, SAFECOMP.

[22]  Stephan Philippi,et al.  Analysis of fault tolerance and reliability in distributed real-time system architectures , 2003, Reliab. Eng. Syst. Saf..

[23]  J. B. Bowles,et al.  Approximate Reliability and Availability Models for High Availability and Fault‐tolerant Systems with Repair , 2004 .

[24]  Sebastian Wallner A configurable system-on-chip architecture for embedded and real-time applications: concepts, design and realization , 2005, J. Syst. Archit..

[25]  Piotr Jędrzejowicz,et al.  Scheduling multiple variant programs under hard real-time constraints , 2000, Eur. J. Oper. Res..

[26]  D. Elmakis,et al.  Redundancy optimization for series-parallel multi-state systems , 1998 .

[27]  Andreas Steininger,et al.  Dealing with dormant faults in an embedded fault-tolerant computer system , 2003, IEEE Trans. Reliab..

[28]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[29]  Maciej Drozdowski,et al.  Scheduling multiprocessor tasks -- An overview , 1996 .

[30]  Gregory Levitin,et al.  Optimal structure of fault-tolerant software systems , 2005, Reliab. Eng. Syst. Saf..

[31]  Jacek Blazewicz,et al.  Scheduling multiprocessor tasks on parallel processors with limited availability , 2003, Eur. J. Oper. Res..

[32]  Jarek Nabrzyski,et al.  Grid Resource Management , 2004 .

[33]  Gregory Levitin,et al.  Performance distribution of a fault-tolerant system in the presence of failure correlation , 2006 .