Software rejuvenation: analysis, module and applications

Software rejuvenation is the concept of gracefully terminating an application and immediately restarting it at a clean internal state. In a client-server type of application where the server is intended to ran perpetually for providing a service to its clients, rejuvenating the server process periodically during the most idle time of the server increases the availability of that service. In a long-running computation-intensive application, rejuvenating the application periodically and restarting it at a previous checkpoint increases the likelihood of successfully completing the application execution. We present a model for analyzing software rejuvenation in such continuously-running applications and express downtime and costs due to downtime during rejuvenation in terms of the parameters in that model. Threshold conditions for rejuvenation to be beneficial are also derived. We implemented a reusable module to perform software rejuvenation. That module can be embedded in any existing application on a UNIX platform with minimal effort. Experiences with software rejuvenation in a billing data collection subsystem of a telecommunications operations system and other continuously-running systems and scientific applications in AT&T are described.<<ETX>>

[1]  W. Kent Fuchs,et al.  Progressive retry for software error recovery in distributed systems , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[2]  Yi-Min Wang,et al.  Checkpointing and its applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[3]  John D. Musa,et al.  Software reliability measurement , 1984, J. Syst. Softw..

[4]  John D. Musa,et al.  Software reliability - measurement, prediction, application , 1987, McGraw-Hill series in software engineering and technology.

[5]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[6]  Yennun Huang,et al.  Software Implemented Fault Tolerance Technologies and Experience , 1993, FTCS.

[7]  Elaine J. Weyuker,et al.  Estimating the software reliability of smoothly degrading systems , 1994, Proceedings of 1994 IEEE International Symposium on Software Reliability Engineering.

[8]  David Lorge Parnas,et al.  Software aging , 1994, Proceedings of 16th International Conference on Software Engineering.

[9]  F. Lin,et al.  Re-engineering option analysis for managing software rejuvenation , 1993, Inf. Softw. Technol..

[10]  Edward N. Adams,et al.  Optimizing Preventive Service of Software Products , 1984, IBM J. Res. Dev..

[11]  Yennun Huang,et al.  Two Techniques for Transient Software Error Recovery , 1994, Hardware and Software Architectures for Fault Tolerance.