Reliable Task Allocation in Heterogeneous Distributed System with Random Node Failure: Load Sharing Approach

This paper solves the problem of maximizing reliability of heterogeneous distributed computing system where random node can fail permanently. The reliability of the system can be achieved by executing all the tasks queued on its node before they all fail. This paper presents a framework to characterize the service reliability of Distributed Computing System (DCS). Reliability is characterized in the presence of communication uncertainties and topological changes due to nodes deletion. Because the DCS is heterogeneous, so its various nodes have different hardware and software characteristics. The different components of the application also have various hardware and software requirements. These applications will provide their desired functionality when their requirements are satisfied. For improving the reliability of the DCS one way is the proper allocation of tasks among the nodes. Firstly, we determine the candidate nodes for tasks that can satisfy to its requirements. Then we utilize the load sharing policies for handling the nodes failure as well as maximizing the service reliability of DCS.

[1]  Chengbin Chu,et al.  Reliability allocation through cost minimization , 2003, IEEE Trans. Reliab..

[2]  Chung-Chi Hsieh Optimal task allocation and hardware redundancy policies in distributed computing systems , 2003, Eur. J. Oper. Res..

[3]  Farhad Mavaddat,et al.  Reliable Deployment of Component-based Applications into Distributed Environments , 2006, Third International Conference on Information Technology: New Generations (ITNG'06).

[4]  Rüdiger Schollmeier,et al.  A definition of peer-to-peer networking for the classification of peer-to-peer architectures and applications , 2001, Proceedings First International Conference on Peer-to-Peer Computing.

[5]  A. Kumar Verma,et al.  Reliability-based optimal task-allocation in distributed-database management systems , 1997 .

[6]  Anup Kumar,et al.  Genetic algorithm based approach for file allocation on distributed systems , 1995, Comput. Oper. Res..

[7]  Sagar Dhakal,et al.  Maximizing Service Reliability in Distributed Computing Systems with Random Node Failures: Theory and Implementation , 2010, IEEE Transactions on Parallel and Distributed Systems.

[8]  E. Chatelet,et al.  Reliability of multi-states system with load sharing and propagation failure dependence , 2011, 2011 International Conference on Quality, Reliability, Risk, Maintenance, and Safety Engineering.

[9]  Dharma P. Agrawal,et al.  A generalized algorithm for evaluating distributed-program reliability , 1993 .

[10]  Anup Kumar,et al.  Reliability oriented allocation of files on distributed systems , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[11]  R. Moller Distributed Operating Systems: Concepts And Design , 1998, IEEE Concurrency.

[12]  Chung-Chi Hsieh,et al.  Reliability and cost optimization in distributed computing systems , 2003, Comput. Oper. Res..

[13]  Jue-Sam Chou,et al.  A fast algorithm for reliability-oriented task assignment in a distributed system , 2002, Comput. Commun..

[14]  J.-P. Wang,et al.  Task Allocation for Maximizing Reliability of Distributed Computer Systems , 1992, IEEE Trans. Computers.

[15]  A. Y. Hamed Task Allocation for Maximizing Reliability of Distributed Computing Systems Using Genetic Algorithms , 2012 .

[16]  Wesley W. Chu,et al.  Estimation of Intermodule Communication (IMC) and Its Applications in Distributed Processing Systems , 1984, IEEE Transactions on Computers.

[17]  Yskandar Hamam,et al.  Optimal Allocation of Tasks onto Networked Heterogeneous Computers Using Minimax Criterion , 2003 .

[18]  Ishfaq Ahmad,et al.  Optimal task assignment in heterogeneous distributed computing systems , 1998, IEEE Concurr..

[19]  C. Siva Ram Murthy,et al.  Task Allocation Algorithms for Maximizing Reliability of Distributed Computing Systems , 1997, IEEE Trans. Computers.

[20]  S. M. Shatz,et al.  Models and algorithms for reliability-oriented task-allocation in redundant distributed-computer systems , 1989 .