Modeling software systems with rejuvenation, restoration and checkpointing through fluid stochastic Petri nets

In this paper, we present a Fluid Stochastic Petri Net (FSPN) based model which captures the behavior of aging software systems with checkpointing, rejuvenation and self-restoration, three well known techniques of software fault tolerance. The proposed FSPN based modeling framework is novel in many aspects. First, the FSPN formalism itself is extended by adding flush-out arcs. Second, the three techniques are simultaneously captured in a single model for the first time. Third, the formalism enables modeling dependencies of the three techniques on various system features such as failure, load and time in the same framework. Further, our base FSPN model can be viewed as a generalization of most previous models in the literature. We show that these FSPNs can not only mimic previously published models but can also extend them. For one FSPN model, we present numerical results to illustrate their usage in deriving measures of interest.

[1]  Kishor S. Trivedi,et al.  On the Solution of GSPN Reward Models , 1991, Perform. Evaluation.

[2]  Andrzej Duda,et al.  The Effects of Checkpointing on Program Execution Time , 1983, Inf. Process. Lett..

[3]  René David,et al.  Autonomous And Timed Continous Petri Nets , 1991, Applications and Theory of Petri Nets.

[4]  David M. Nicol,et al.  Discrete-event simulation of fluid stochastic Petri nets , 1997, Proceedings of the Seventh International Workshop on Petri Nets and Performance Models.

[5]  E Marshall,et al.  Fatal error: how patriot overlooked a scud. , 1992, Science.

[6]  David M. Nicol,et al.  Fluid stochastic Petri nets: Theory, applications, and solution techniques , 1998, Eur. J. Oper. Res..

[7]  G. V. Kulkarni,et al.  The Completion Time of a Job on Multi-Mode Systems , 1985 .

[8]  Kishor S. Trivedi,et al.  Analysis of software rejuvenation using Markov Regenerative Stochastic Petri Net , 1995, Proceedings of Sixth International Symposium on Software Reliability Engineering. ISSRE'95.

[9]  A. T. Tai,et al.  On-board preventive maintenance: analysis of effectiveness and optimal duty period , 1997, Proceedings Third International Workshop on Object-Oriented Real-Time Dependable Systems.

[10]  Kishor S. Trivedi,et al.  Discrete-event simulation of uid stochastic Petri nets , 1997 .

[11]  Kishor S. Trivedi,et al.  FSPNs: Fluid Stochastic Petri Nets , 1993, Application and Theory of Petri Nets.

[12]  A.I. Elwalid,et al.  Statistical multiplexing with loss priorities in rate-based congestion control of high-speed networks , 1994, IEEE Trans. Commun..

[13]  Edward G. Coffman,et al.  Optimal strategies for scheduling checkpoints and preventive maintenance , 1990 .

[14]  Kishor S. Trivedi,et al.  Recent Developments in Non-Markovian Stochastic Petri Nets , 1998, J. Circuits Syst. Comput..

[15]  Jacques Malenfant,et al.  Computing Optimal Checkpointing Strategies for Rollback and Recovery Systems , 1988, IEEE Trans. Computers.

[16]  Kishor S. Trivedi,et al.  Analysis of Preventive Maintenance in Transactions Based Software Systems , 1998, IEEE Trans. Computers.

[17]  Jim Gray,et al.  Why Do Computers Stop and What Can Be Done About It? , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[18]  Kishor S. Trivedi,et al.  Optimal rejuvenation for tolerating soft failures , 1996 .

[19]  Matteo Sereno,et al.  Fluid stochastic Petri nets: An extended formalism to include non-Markovian models , 1999, Proceedings 8th International Workshop on Petri Nets and Performance Models (Cat. No.PR00331).

[20]  ModellingKatinka Wolter,et al.  Second Order Fluid Stochastic Petri Nets : an Extension ofGSPNs for Approximate and Continuous , 1997 .

[21]  G. V. Kulkarni,et al.  Effects of Checkpointing and Queueing on Program Performance , 1987 .

[22]  Vidyadhar G. Kulkarni,et al.  Second-Order Fluid Flow Models: Reflected Brownian Motion in a Random Environment , 1995, Oper. Res..

[23]  René David,et al.  Continuous and Hybrid Petri Nets , 1998, J. Circuits Syst. Comput..

[24]  Yennun Huang,et al.  Software rejuvenation: analysis, module and applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[25]  Marco Ajmone Marsan,et al.  Modelling with Generalized Stochastic Petri Nets , 1995, PERV.

[26]  Kishor S. Trivedi,et al.  The Completion Time of Programs on Processors Subject to Failure and Repair , 1993, IEEE Trans. Computers.

[27]  Kishor S. Trivedi,et al.  Minimizing completion time of a program by checkpointing and rejuvenation , 1996, SIGMETRICS '96.

[28]  Daniel P. Siewiorek,et al.  High-availability computer systems , 1991, Computer.

[29]  Mark Sullivan,et al.  Software defects and their impact on system availability-a study of field failures in operating systems , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[30]  Edmundo de Souza e Silva,et al.  Availability and performance evaluation of database systems under periodic checkpoints , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[31]  Matteo Sereno,et al.  Fine grained software rejuvenation models , 1998, Proceedings. IEEE International Computer Performance and Dependability Symposium. IPDS'98 (Cat. No.98TB100248).

[32]  Kishor S. Trivedi,et al.  A methodology for detection and estimation of software aging , 1998, Proceedings Ninth International Symposium on Software Reliability Engineering (Cat. No.98TB100257).

[33]  Kishor S. Trivedi,et al.  On the analysis of software rejuvenation policies , 1997, Proceedings of COMPASS '97: 12th Annual Conference on Computer Assurance.

[34]  Kishor S. Trivedi,et al.  Modeling and Analysis of Load and Time Dependent Software Rejuvenation Policies , 1996 .

[35]  Kang G. Shin,et al.  Optimal Checkpointing of Real-Time Tasks , 1987, IEEE Transactions on Computers.