Achieving and Assuring High Availability

We discuss availability aspects of large software-based systems. We classify faults into Bohrbugs, Mandelbugs and aging-related bugs, and then examine mitigation methods for the last two bug types. We also consider quantitative approaches to availability assurance.

[1]  Ann T. Tai,et al.  On-Board Preventive Maintenance: A Design-Oriented Analytic Study for Long-Life Applications , 1999, Perform. Evaluation.

[2]  Gerard J. Holzmann,et al.  Conquering Complexity , 2012, Springer London.

[3]  Kishor S. Trivedi,et al.  Availability Monitor for a Software Based System , 2007 .

[4]  Wei Xie,et al.  Analysis of a two-level software rejuvenation policy , 2005, Reliab. Eng. Syst. Saf..

[5]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[6]  Rivalino Matias,et al.  An Experimental Study on Software Aging and Rejuvenation in Web Servers , 2006, 30th Annual International Computer Software and Applications Conference (COMPSAC'06).

[7]  Kishor S. Trivedi,et al.  Modeling User-Perceived Service Availability , 2005, ISAS.

[8]  Kishor S. Trivedi,et al.  Performance and Reliability Analysis of Computer Systems , 1996, Springer US.

[9]  Kishor S. Trivedi,et al.  Analysis of Software Aging in a Web Server , 2006, IEEE Transactions on Reliability.

[10]  Shigeru Chiba,et al.  A Fast Rejuvenation Technique for Server Consolidation with Virtual Machines , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[11]  Yennun Huang,et al.  Software rejuvenation: analysis, module and applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[12]  Kishor S. Trivedi,et al.  Availability analysis of blade server systems , 2008, IBM Syst. J..

[13]  E Marshall,et al.  Fatal error: how patriot overlooked a scud. , 1992, Science.

[14]  Kishor S. Trivedi,et al.  A comprehensive model for software rejuvenation , 2005, IEEE Transactions on Dependable and Secure Computing.

[15]  Kishor S. Trivedi,et al.  Analysis and implementation of software rejuvenation in cluster systems , 2001, SIGMETRICS '01.

[16]  Kishor S. Trivedi,et al.  Proactive management of software aging , 2001, IBM J. Res. Dev..

[17]  Kishor S. Trivedi,et al.  Sufficient Conditions for Existence of a Fixed Point in Stochastic Reward Net-Based Iterative Models , 1996, IEEE Trans. Software Eng..

[18]  Kishor S. Trivedi,et al.  Performance and reliability evaluation of passive replication schemes in application level fault tolerance , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[19]  Dong Chen,et al.  Reliability and availability analysis for the JPL Remote Exploration and Experimentation System , 2002, Proceedings International Conference on Dependable Systems and Networks.

[20]  Kishor S. Trivedi,et al.  Modeling High Availability , 2006, 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06).

[21]  Kishor S. Trivedi,et al.  Fighting bugs: remove, retry, replicate, and rejuvenate , 2007, Computer.

[22]  Tadashi Dohi,et al.  Statistical non-parametric algorithms to estimate the optimal software rejuvenation schedule , 2000, Proceedings. 2000 Pacific Rim International Symposium on Dependable Computing.

[23]  Liang Yin,et al.  Hierarchical composition and aggregation of state-based availability and performability models , 2003, IEEE Trans. Reliab..

[24]  Kishor S. Trivedi,et al.  A methodology for detection and estimation of software aging , 1998, Proceedings Ninth International Symposium on Software Reliability Engineering (Cat. No.98TB100257).

[25]  Kishor S. Trivedi,et al.  Performance And Reliability Analysis Of Computer Systems (an Example-based Approach Using The Sharpe Software , 1997, IEEE Transactions on Reliability.

[26]  Kishor S. Trivedi,et al.  Stochastic Modeling of Composite Web Services for Closed-Form Analysis of Their Performance and Reliability Bottlenecks , 2007, ICSOC.