Static Load Distribution for Communication Intensive Parallel Computing in Multiclusters

In this paper, we examine load distributions to minimize total run time in multi-cluster parallel computing algorithms by applying divisible load theory techniques. Even with homogeneous processor speeds, parallel computations in multi-clusters that evenly assign load can run at less than maximum efficiency due to communication heterogeneity. Using a modified version of the LogP parallel computing model, we propose a general technique of assigning load among multiple clusters to minimize the time each processor spends waiting. This technique is used to determine optimal load distribution for spin glass simulation and parallel bucket sort in multi-cluster systems. It also allows fast analysis of the effects of adding processors or clusters to the computation. We experimentally demonstrate the accuracy of our model, and show how it eliminates wait time in multi-cluster parallel computations. Using load distributions derived from our technique results in an execution time decrease of up to 50%, depending on the degree of heterogeneity among clusters and communication characteristics of the computation.

[1]  Debasish Ghose,et al.  Scheduling Divisible Loads in Parallel and Distributed Systems , 1996 .

[2]  William M. Jones,et al.  Bandwidth-aware co-allocating meta-schedulers for mini-grid architectures , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[3]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[4]  Eunice E. Santos,et al.  Efficient simulation based on sweep selection for 2-D and 3-D Ising spin models on hierarchical clusters , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[5]  Mohamed Jemni,et al.  Improving parallel execution time of sorting on heterogeneous clusters , 2004, 16th Symposium on Computer Architecture and High Performance Computing.

[6]  Yves Robert,et al.  Mapping and load-balancing iterative computations , 2004, IEEE Transactions on Parallel and Distributed Systems.

[7]  Albert Y. Zomaya,et al.  Observations on Using Genetic Algorithms for Dynamic Load-Balancing , 2001, IEEE Trans. Parallel Distributed Syst..

[8]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[9]  E. Ising Beitrag zur Theorie des Ferromagnetismus , 1925 .

[10]  Eunice E. Santos,et al.  Efficient parallel algorithms for 2-dimensional ising spin models , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[11]  Jacques M. Bahi,et al.  Dynamic load balancing and efficient load estimators for asynchronous iterative algorithms , 2005, IEEE Transactions on Parallel and Distributed Systems.

[12]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13]  Anca I. D. Bucur,et al.  Scheduling Policies for Processor Coallocation in Multicluster Systems , 2007, IEEE Transactions on Parallel and Distributed Systems.

[14]  Guy E. Blelloch,et al.  A comparison of sorting algorithms for the connection machine CM-2 , 1991, SPAA '91.

[15]  Courtenay T. Vaughan,et al.  Design of dynamic load-balancing tools for parallel applications , 2000, ICS '00.