Optimal number of hosts in a distributed system based on cost criteria

Redundant or distributed systems are increasingly used in system design so that the required reliability and availability can be easily achieved. However, such an approach requires additional resources that can be very costly. Hence, how to design and test such a system in the most cost-effective way is of concern to the developers. A general cost model and a solution algorithm are presented for the determination of the optimal number of hosts and optimal system debugging time that minimize the total cost while achieving a certain performance objective. During testing, software faults are corrected and the reliability shows an increasing trend, and hence system reliability increases. A general system model is constructed based on a Markov process with software reliability and availability obtained from software reliability growth models. The optimization problem is formulated based on the cost criteria and the solution procedure is described. An application example is presented.

[1]  Amrit L. Goel,et al.  Optimum release time for software systems based on reliability and cost criteria , 1984, J. Syst. Softw..

[2]  Hany H. Ammar,et al.  A comparative analysis of hardware and software fault tolerance: Impact on software reliability engineering , 2000, Ann. Softw. Eng..

[3]  Robert T. Clemen,et al.  Making Hard Decisions with Decisiontools Suite , 2000 .

[4]  Hoang Pham,et al.  A software cost model with error removal times and risk costs , 1998, Int. J. Syst. Sci..

[5]  Stephan Philippi,et al.  Analysis of fault tolerance and reliability in distributed real-time system architectures , 2003, Reliab. Eng. Syst. Saf..

[6]  C. Murray Woodside,et al.  Evaluating layered distributed software systems with fault-tolerant features , 2001, Perform. Evaluation.

[7]  John Quigley,et al.  Achieving growth in reliability , 1999, Ann. Oper. Res..

[8]  Byoungju Choi,et al.  Optimization models for quality and cost of modular software systems , 1999, Eur. J. Oper. Res..

[9]  Gianluca Dini Electronic voting in a large-scale distributed system , 2001, Networks.

[10]  Edward J. McCluskey,et al.  A Design Diversity Metric and Analysis of Redundant Systems , 2002, IEEE Trans. Computers.

[11]  Toshio Nakagawa,et al.  Optimal testing policy for a computer system with fault margin , 1991 .

[12]  Joseph Kreimer,et al.  Real-time system with homogeneous servers and nonidentical channels in steady-state , 2002, Comput. Oper. Res..

[13]  Oded Berman,et al.  Optimization models for recovery block schemes , 1999, Eur. J. Oper. Res..

[14]  Yuan-Shun Dai,et al.  A study of service reliability and availability for distributed systems , 2003, Reliab. Eng. Syst. Saf..

[15]  Akhil Kumar,et al.  Voting mechanisms in distributed systems , 1991 .

[16]  Jean-Claude Laprie,et al.  X-Ware Reliability and Availability Modeling , 1992, IEEE Trans. Software Eng..

[17]  Stephen Taylor,et al.  Reliable heterogeneous applications , 2003, IEEE Trans. Reliab..

[18]  Oded Berman,et al.  Optimization models for complex recovery block schemes , 1999, Comput. Oper. Res..

[19]  Shigeru Yamada,et al.  Economic analysis of software release problems with warranty cost and reliability requirement , 1999 .

[20]  Michael R. Lyu,et al.  What is software reliability? , 1994, Proceedings of COMPASS'94 - 1994 IEEE 9th Annual Conference on Computer Assurance.

[21]  Hoang Pham Optimal cost design of replicated data in distributed database systems , 1998, Int. J. Syst. Sci..

[22]  Yoshinobu Tamura,et al.  A software reliability growth model for a distributed development environment , 2000 .

[23]  Way Kuo,et al.  An annotated overview of system-reliability optimization , 2000, IEEE Trans. Reliab..

[24]  Robert T. Clemen,et al.  Making Hard Decisions: An Introduction to Decision Analysis , 1997 .

[25]  Hoang Pham Optimal design of majority redundant systems , 1992 .

[26]  Z. Jelinski,et al.  Software reliability Research , 1972, Statistical Computer Performance Evaluation.

[27]  Michael R. Lyu,et al.  Handbook of software reliability engineering , 1996 .

[28]  Toshio Nakagawa,et al.  An optimal number of microprocessor units with watchdog processor , 2000 .

[29]  Jacek Gondzio,et al.  Building and Solving Large-Scale Stochastic Programs on an Affordable Distributed Computing System , 2000, Ann. Oper. Res..

[30]  Yuan-Shun Dai,et al.  A model for availability analysis of distributed software/hardware systems , 2002, Inf. Softw. Technol..

[31]  Min Xie,et al.  Software Reliability Modelling , 1991, Series on Quality, Reliability and Engineering Statistics.