Reliability and cost optimization in distributed computing systems

The reliability of the communication network and its processing units and the strategy of task allocation are essential in determining the system reliability of a distributed computing system. Reliability of such systems can be improved by endowing resource redundancy or the use of highly reliable components. In this paper, we develop a relationship between system cost and hardware redundancy levels, assuming cycle-free distributed computing systems. Based on the derived relationship, we propose a hybrid heuristic which combines genetic algorithms and the steepest decent method to seek the optimal task allocation and hardware redundancy policies such that system cost is minimized.

[1]  John H. Holland,et al.  Outline for a Logical Theory of Adaptive Systems , 1962, JACM.

[2]  Salim Hariri,et al.  Reliability Optimization in the Design of Distributed Systems , 1985, IEEE Transactions on Software Engineering.

[3]  Wesley W. Chu,et al.  Estimation of Intermodule Communication (IMC) and Its Applications in Distributed Processing Systems , 1984, IEEE Transactions on Computers.

[4]  Chien-Chung Shen,et al.  A Graph Matching Approach to Optimal Task Assignment in Distributed Computing Systems Using a Minimax Criterion , 1985, IEEE Trans. Computers.

[5]  Cauligi S. Raghavendra,et al.  A model for optimal database allocation in distributed computing systems , 1990, Proceedings. IEEE INFOCOM '90: Ninth Annual Joint Conference of the IEEE Computer and Communications Societies@m_The Multiple Facets of Integration.

[6]  Nostrand Reinhold,et al.  the utility of using the genetic algorithm approach on the problem of Davis, L. (1991), Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York. , 1991 .

[7]  Andrew S. Tanenbaum,et al.  Distributed operating systems , 2009, CSUR.

[8]  C. S. George Lee,et al.  Neural fuzzy systems: a neuro-fuzzy synergism to intelligent systems , 1996 .

[9]  Masahiro Tsuchiya,et al.  A Task Allocation Model for Distributed Computing Systems , 1982, IEEE Transactions on Computers.

[10]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[11]  C. Siva Ram Murthy,et al.  Task Allocation Algorithms for Maximizing Reliability of Distributed Computing Systems , 1997, IEEE Trans. Computers.

[12]  Deng-Jyi Chen,et al.  The reliability problem in distributed database systems , 1997, Proceedings of ICICS, 1997 International Conference on Information, Communications and Signal Processing. Theme: Trends in Information Systems Engineering and Wireless Multimedia Communications (Cat..

[13]  J. D. Ortiz,et al.  Reliability issues with multiprocessor distributed database systems: a case study , 1989 .

[14]  Deng-Jyi Chen,et al.  The distributed program reliability analysis on star topologies , 2000, Comput. Oper. Res..

[15]  Michael Stonebraker,et al.  Highly redundant management of distributed data , 1990, [1990] Proceedings. Workshop on the Management of Replicated Data.

[16]  J.-P. Wang,et al.  Task Allocation for Maximizing Reliability of Distributed Computer Systems , 1992, IEEE Trans. Computers.

[17]  Deng-Jyi Chen,et al.  Distributed-program reliability analysis: complexity and efficient algorithms , 1999 .

[18]  Cauligi S. Raghavendra,et al.  A model for optimal resource allocation in distributed computing systems , 1988, IEEE INFOCOM '88,Seventh Annual Joint Conference of the IEEE Computer and Communcations Societies. Networks: Evolution or Revolution?.

[19]  R. Kevin Wood Factoring Algorithms for Computing K-Terminal Network Reliability , 1986, IEEE Transactions on Reliability.

[20]  Yelena Yesha,et al.  Optimal Allocation for Partially Replicated Database Systems on Ring Networks , 1992, IEEE Trans. Knowl. Data Eng..

[21]  S. M. Shatz,et al.  Models and algorithms for reliability-oriented task-allocation in redundant distributed-computer systems , 1989 .

[22]  C. Siva Ram Murthy,et al.  Improved task-allocation algorithms to maximize reliability of redundant distributed computing systems , 1995 .

[23]  P. Y. Chang,et al.  Optimal routing for distributed computing systems with data replication , 1996, Proceedings of IEEE International Computer Performance and Dependability Symposium.

[24]  A. Kumar Verma,et al.  Reliability-based optimal task-allocation in distributed-database management systems , 1997 .