Resource allocation in grid computing

Abstract Grid computing, in which a network of computers is integrated to create a very fast virtual computer, is becoming ever more prevalent. Examples include the TeraGrid and Planet-lab.org, as well as applications on the existing Internet that take advantage of unused computing and storage capacity of idle desktop machines, such as Kazaa, SETI@home, Climateprediction.net, and Einstein@home. Grid computing permits a network of computers to act as a very fast virtual computer. With many alternative computers available, each with varying extra capacity, and each of which may connect or disconnect from the grid at any time, it may make sense to send the same task to more than one computer. The application can then use the output of whichever computer finishes the task first. Thus, the important issue of the dynamic assignment of tasks to individual computers is complicated in grid computing by the option of assigning multiple copies of the same task to different computers. We show that under fairly mild and often reasonable conditions, maximizing task replication stochastically maximizes the number of task completions by any time. That is, it is better to do the same task on as many computers as possible, rather than assigning different tasks to individual computers. We show maximal task replication is optimal when tasks have identical size and processing times have a NWU (New Worse than Used; defined later) distribution. Computers may be heterogeneous and their speeds may vary randomly, as is the case in grid computing environments. We also show that maximal task replication, along with a cμ rule, stochastically maximizes the successful task completion process when task processing times are exponential and depend on both the task and computer, and tasks have different probabilities of completing successfully.