Lifetime Reliability for Load-Sharing Redundant Systems With Arbitrary Failure Distributions

In this work, a general closed-form expression is presented for the lifetime reliability of load-sharing k -out-of-n :G hybrid redundant systems. In such systems, m components are initially configured as active units. Depending on whether it is performing tasks, an active component can be in either a processing, or a wait state. Each state corresponds to an arbitrary failure distribution. The remaining (n-m) are spares to provide fault tolerance. Each time an active component fails, a spare one converts into active mode, until there are no more spares in the system. Then, the system works in a gracefully degrading manner such that less than m components share the workload, until the number of good components is less than k. The task allocation, and service are modeled as queueing systems, wherein the utilization ratio essentially affects the aging effect of components. We integrate the various failure distributions for components in different operational states into an analytical model according to the statistical properties of the task allocation mechanisms, and the components' processing capacity, and analyse the lifetime reliability of the entire system. Finally, three special cases, and a series of numerical experiments are discussed in detail to show the practical applicability of the proposed approach.

[1]  M.D. Beaudry,et al.  PERFORMANCE RELATED RELIABILITY MEASURES FOR COMPUTING SYSTEMS , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[2]  Algirdas Avizienis,et al.  Reliability analysis and architecture of a hybrid-redundant digital system: generalized triple modular redundancy with self-repair , 1970, AFIPS '70 (Spring).

[3]  Min Xie,et al.  Availability and reliability of k-out-of-(M+N): G warm standby systems , 2006, Reliab. Eng. Syst. Saf..

[4]  Michael Pecht,et al.  Reliability of a k-out-of-n warm-standby system , 1992 .

[5]  Qiang Xu,et al.  On Modeling the Lifetime Reliability of Homogeneous Manycore Systems , 2008, 2008 14th IEEE Pacific Rim International Symposium on Dependable Computing.

[6]  Yuan-Shun Dai,et al.  Computing systems reliability - models and analysis , 2004 .

[7]  F. Szidarovszky,et al.  Time-varying failure rates in the availability and reliability analysis of repairable systems , 1995 .

[8]  L. Lamberson,et al.  Modeling a shared-load k-out-of-n:G system , 1991 .

[9]  Pham Hoang,et al.  Tampered Failure Rate Load-Sharing Systems: Status and Perspectives , 2008 .

[10]  William S. Griffith,et al.  Optimal Reliability Modeling: Principles and Applications , 2004, Technometrics.

[11]  Coniferous softwood GENERAL TERMS , 2003 .

[12]  L. R. Goel,et al.  Analysis of a three-unit redundant system with two types of repair and inspection , 1989 .

[13]  Kaung-Hwa Chen,et al.  A multivariant exponential shared-load model , 1993 .

[14]  Saurabh Dighe,et al.  An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[15]  Pradip Bose,et al.  The case for lifetime reliability-aware microprocessors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[16]  Brian W. Hollocks,et al.  The Reliability, Availability and Productiveness of Systems , 1993 .

[17]  R. Bergman,et al.  Reliability analysis of k-out-of-n load-sharing systems , 2008, 2008 Annual Reliability and Maintainability Symposium.

[18]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[19]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[20]  Huamin Liu Reliability of a load-sharing k-out-of-n:G system: non-iid components with arbitrary distributions , 1998 .

[21]  M. Zuo,et al.  Optimal Reliability Modeling: Principles and Applications , 2002 .

[22]  Shekhar Borkar Thousand Core ChipsA Technology Perspective , 2007, DAC 2007.

[23]  Farokh B. Bastani,et al.  Warm Standby in Hierarchically Structured Process-Control Programs , 1994, IEEE Trans. Software Eng..

[24]  E. J. Vanderperre Reliability analysis of a warm standby system with general distributions , 1990 .

[25]  R. Subramanian,et al.  Reliability analysis of a complex standby redundant systems , 1995 .

[26]  Anant Agarwal,et al.  The KILL Rule for Multicore , 2007, 2007 44th ACM/IEEE Design Automation Conference.