Scheduling data intensive parallel processing in distributed and networked environments

Optimal divisible load scheduling in network environments is examined. A unique optimality proof for parallel processor load scheduling is presented. Means of finding the optimal schedule are also described. Other topics in this thesis include means of improving parallel processor speedup through efficient scheduling, a comparison of optimal scheduling to simple equal division scheduling and scheduling in a multi-job environment. Finally, expressions for expected record search times in a parallel database doing file search are developed. Staggered load scheduling means that a load is well scheduled so as to minimize communication delay. In Chap. 2, staggered load scheduling strategy is compared with arbitrary scheduling strategies. The word arbitrary is used in the sense that a M/G/1 queue has a general solution. Any scheduling policy may be substituted for an arbitrary scheduling. It is concluded that staggered load scheduling is an optimal scheduling policy. In addition, means of finding optimal solutions are also presented. In Chap. 3, it is pointed out that parallel processor speedup may be limited due to communication delay. Although the number of processors may be increased, speedup eventually becomes saturated. This speedup limitation is a function of the communication link speed. In this chapter, various mechanisms that improve speedup are discussed. Multi-installment scheduling with no additional hardware cost and multi-channel scheduling with a low software cost as strategies for speedup improvement is proposed. In Chap 4, a L-level K-ary tree network is considered. In the L-level K-ary tree network, optimal scheduling is compared to simple equal division scheduling. A closed form of finish time is also presented. The procedure used to obtain the finish time for the L-level K-ary tree network can be applied to a general tree network. A critical aspect of distributed systems for multiple users is load sharing. Load sharing balances loads over nodes in a distributed system, even tough loads arrive randomly. In Chap. 5, it is pointed out that distributed systems can be utilized for file sharing, multiple users access and parallel processing. This improves the overall performance of a distributed system. Another significant aspect of a distributed system is parallel processing which improves speedup. In this chapter, optimal load sharing in a multiple job environment is examined. In Chap. 6, elegant expressions for the expected time to find both single and multiple records are found. A linear daisy chain architecture and a single level tree network architecture are investigated. For the single tree network both single installment and multi-installment load distribution is considered. The techniques described here can be used to model and solve for record search time on other architectures. This work is significant for demonstrating the power of divisible load scheduling theory for predicting search times.