Performance Efiects of Scheduling Strategies for Master/Slave Distributed Applications

The achievement of parallel application performance on non-dedicated workstation clusters requires careful attention to the scheduling of tasks and communication on the underlying platform. In the literature, application scheduling policies are usually chosen by matching the resource requirements of an application with the performance characteristics of the target platform. However, when clusters of workstations are shared with other users, platform performance is non-uniform and varies over time. As a result, the performance of distinct scheduling policies may also vary depending on dynamic system state and particular characteristics of the job being run. Our experimental work focuses on a master/slave parallel ray-tracing application executing on a set of workstation clusters at UCSD and the San Diego Supercomputer Center. The experiments show that two different scheduling strategies, one static and one dynamic, exhibit very different performance sensitivities to variabilities in resource capabilities and workload distribution. We demonstrate for our example application that neither scheduling strategy by itself consistently induces the best application performance (minimal execution time) when running on the same resources under normally experienced production operating conditions. These results support the idea that dynamic selection of appropriate scheduling strategies to match run-time conditions provides a promising approach to achieving application performance for master/slave applications on heterogeneous time-shared workstation clusters.

[1]  Srinivasan Parthasarathy,et al.  Customized dynamic load balancing for a network of workstations , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[2]  Cauligi S. Raghavendra,et al.  Parallel implementation of a ray tracing algorithm for distributed memory parallel computers , 1997 .

[3]  Erik Reinhard,et al.  Rendering Large Scenes Using Parallel Ray Tracing , 1997, Parallel Comput..

[4]  Jeff Kramer,et al.  Methodical Analysis of Adaptive Load Sharing Algorithms , 1992, IEEE Trans. Parallel Distributed Syst..

[5]  Edward D. Lazowska,et al.  Speedup Versus Efficiency in Parallel Systems , 1989, IEEE Trans. Computers.

[6]  Alan Heirich,et al.  A Competitive Analysis of Load Balancing Strategies for Parallel Ray Tracing , 2004, The Journal of Supercomputing.

[7]  Assaf Schuster,et al.  Parallel progressive rendering of animation sequences at interactive rates on distributed-memory machines , 1997, PRS '97.

[8]  Al Geist,et al.  Network-based concurrent computing on the PVM system , 1992, Concurr. Pract. Exp..

[9]  R. Sridhar,et al.  Load Balancing Methods for Ray Tracing and Binary Tree Computing Using PVM , 1995, Parallel Comput..

[10]  Jack Dongarra,et al.  PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing , 1995 .

[11]  Richard Wolski,et al.  Forecasting network performance to support dynamic scheduling using the network weather service , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).