Reliability and Performance of Component Based Software Systems with Restarts, Retries, Reboots and Repairs

High reliability and performance are vital for software systems handling diverse mission critical applications. Such software systems are usually component based and may possess multiple levels of fault recovery. A number of parameters, including the software architecture, behavior of individual components, underlying hardware, and the fault recovery measures, affect the behavior of such systems, and there is a need for an approach to study them. In this paper we present an integrated approach for modeling and analysis of component based systems with multiple levels of failures and fault recovery both at the software, as well as the hardware level. The approach is useful to analyze attributes such as overall reliability, performance, and machine availabilities for such systems, wherein failures may happen at the software components, the operating system, or at the hardware, and corresponding restarts, retries, reboots or repairs are used for mitigation. Our approach encompasses Markov chain, and queueing network modeling, for estimating system reliability, machine availabilities and performance. The approach is helpful for designing and building better systems and also while improving existing systems

[1]  Vittorio Cortellessa,et al.  From UML models to software performance results: an SPE process based on XML interchange formats , 2005, WOSP '05.

[2]  Ravishankar K. Iyer,et al.  Measurement and modeling of computer reliability as affected by system activity , 1986, TOCS.

[3]  Boudewijn R. Haverkort,et al.  Performance and reliability analysis of computer systems: An example-based approach using the sharpe software package , 1998 .

[4]  George Candea,et al.  Autonomous recovery in componentized Internet applications , 2006, Cluster Computing.

[5]  Daniel A. Menascé,et al.  A Method for Design and Performance Modeling of Client/Server Systems , 2000, IEEE Trans. Software Eng..

[6]  Kishor S. Trivedi,et al.  Sufficient Conditions for Existence of a Fixed Point in Stochastic Reward Net-Based Iterative Models , 1996, IEEE Trans. Software Eng..

[7]  Gunter Bolch,et al.  Queueing Networks and Markov Chains , 2005 .

[8]  Gary Hughes-Fenchel A flexible clustered approach to high availability , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[9]  Jim Gray,et al.  Why Do Computers Stop and What Can Be Done About It? , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[10]  W. Whitt,et al.  The Queueing Network Analyzer , 1983, The Bell System Technical Journal.

[11]  14th International Symposium on Software Reliability Engineering , 2003, 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003..

[12]  Kishor S. Trivedi,et al.  Performance And Reliability Analysis Of Computer Systems (an Example-based Approach Using The Sharpe Software , 1997, IEEE Transactions on Reliability.

[13]  Ralf H. Reussner,et al.  Reliability prediction for component-based software architectures , 2003, J. Syst. Softw..

[14]  Kishor S. Trivedi,et al.  Evaluating performance attributes of layered software architecture , 2005, CBSE'05.

[15]  George Candea,et al.  Microreboot - A Technique for Cheap Recovery , 2004, OSDI.

[16]  David Lorge Parnas,et al.  Availability evaluation of hardware/software systems with several recovery procedures , 2005, 29th Annual International Computer Software and Applications Conference (COMPSAC'05).

[17]  Isi Mitrani,et al.  Fixed-Point Approximations for Distributed Systems , 1983, Computer Performance and Reliability.

[18]  B. Avi-Itzhak,et al.  A Many-Server Queue with Service Interruptions , 1968, Oper. Res..

[19]  Kishor S. Trivedi,et al.  Performance and Reliability Analysis of Computer Systems , 1996, Springer US.

[20]  Connie U. Smith,et al.  Performance Engineering of Software Systems , 1990, SIGMETRICS Perform. Evaluation Rev..

[21]  Roger C. Cheung,et al.  A User-Oriented Software Reliability Model , 1978, IEEE Transactions on Software Engineering.

[22]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[23]  Swapna S. Gokhale,et al.  Reliability prediction and sensitivity analysis based on software architecture , 2002, 13th International Symposium on Software Reliability Engineering, 2002. Proceedings..

[24]  Kishor S. Trivedi,et al.  Architecture based analysis of performance, reliability and security of software systems , 2005, WOSP '05.

[25]  Dorina C. Petriu,et al.  Software Performance Models from System Scenarios in Use Case Maps , 2002, Computer Performance Evaluation / TOOLS.

[26]  Katerina Goseva-Popstojanova,et al.  Architecture-based approach to reliability assessment of software systems , 2001, Perform. Evaluation.

[27]  Kishor S. Trivedi,et al.  Performance and reliability evaluation of passive replication schemes in application level fault tolerance , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[28]  Kishor S. Trivedi,et al.  A comprehensive model for software rejuvenation , 2005, IEEE Transactions on Dependable and Secure Computing.