Communication-Aware Load Balancing for Parallel Applications on Clusters

Cluster computing has emerged as a primary and cost-effective platform for running parallel applications, including communication-intensive applications that transfer a large amount of data among the nodes of a cluster via the interconnection network. Conventional load balancers have proven effective in increasing the utilization of CPU, memory, and disk I/O resources in a cluster. However, most of the existing load-balancing schemes ignore network resources, leaving an opportunity to improve the effective bandwidth of networks on clusters running parallel applications. For this reason, we propose a communication-aware load-balancing technique that is capable of improving the performance of communication-intensive applications by increasing the effective utilization of networks in cluster environments. To facilitate the proposed load-balancing scheme, we introduce a behavior model for parallel applications with large requirements of network, CPU, memory, and disk I/O resources. Our load-balancing scheme can make full use of this model to quickly and accurately determine the load induced by a variety of parallel applications. Simulation results generated from a diverse set of both synthetic bulk synchronous and real parallel applications on a cluster show that our scheme significantly improves the performance, in terms of slowdown and turn-around time, over existing schemes by up to 206 percent (with an average of 74 percent) and 235 percent (with an average of 82 percent), respectively.

[1]  P. Messina,et al.  Architectural requirements of parallel scientific applications with explicit communication , 1993, ISCA '93.

[2]  Sanda M. Harabagiu,et al.  Performance analysis of a distributed question/answering system , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[3]  Al Davis,et al.  Design trade-offs for user-level I/O architectures , 2006, IEEE Transactions on Computers.

[4]  Mor Harchol-Balter,et al.  Exploiting process lifetime distributions for dynamic load balancing , 1995, SIGMETRICS.

[5]  Kihong Park,et al.  Towards communication-sensitive load balancing , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[6]  Francine Berman,et al.  When the Herd Is Smart: Aggregate Behavior in the Selection of Job Request , 2003, IEEE Trans. Parallel Distributed Syst..

[7]  Li Xiao,et al.  Improving distributed workload performance by sharing both CPU and memory resources , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[8]  Jeffrey K. Hollingsworth,et al.  Exploiting Fine-Grained Idle Periods in Networks of Workstations , 2000, IEEE Trans. Parallel Distributed Syst..

[9]  Jeffrey S. Vetter,et al.  An Empirical Performance Evaluation of Scalable Scientific Applications , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[10]  Amnon Barak,et al.  The home model and competitive algorithms for load balancing in a computing cluster , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[11]  Sanjeev Setia,et al.  Availability and utility of idle memory in workstation clusters , 1999, SIGMETRICS '99.

[12]  Xiao Qin,et al.  Design and analysis of a load balancing strategy in Data Grids , 2007, Future Gener. Comput. Syst..

[13]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[14]  Jeffrey S. Vetter,et al.  Communication characteristics of large-scale scientific applications for contemporary cluster architectures , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[15]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[16]  Wu-chun Feng,et al.  Optimizing 10-Gigabit Ethernet for Networks of Workstations, Clusters, and Grids: A Case Study , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[17]  Dhabaleswar K. Panda,et al.  High performance implementation of MPI derived datatype communication over InfiniBand , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[18]  Jane W.-S. Liu,et al.  Dynamic Load Balancing Algorithms in Homogeneous Distributed Systems , 1986, IEEE International Conference on Distributed Computing Systems.

[19]  Xiao Qin,et al.  Performance comparisons of load balancing algorithms for I/O-intensive workloads on clusters , 2008, J. Netw. Comput. Appl..

[20]  José Duato,et al.  On the design of communication-aware task scheduling strategies for heterogeneous systems , 2000, Proceedings 2000 International Conference on Parallel Processing.

[21]  John A. Stankovic,et al.  Simulations of Three Adaptive, Decentralized Controlled, Job Scheduling Algorithms , 1984, Comput. Networks.

[22]  Xiao Qin,et al.  Dynamic Load Balancing for I/O-Intensive Tasks on Heterogeneous Clusters , 2003, HiPC.

[23]  Rolf Riesen,et al.  Portals 3.0: protocol building blocks for low overhead communication , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[24]  Andrea C. Arpaci-Dusseau,et al.  Effective distributed scheduling of parallel workloads , 1996, SIGMETRICS '96.

[25]  Emmanuel Jeannot,et al.  Messages Scheduling for Parallel Data Redistribution between Clusters , 2006, IEEE Transactions on Parallel and Distributed Systems.

[26]  Xiao Qin,et al.  Towards load balancing support for I/O-intensive parallel jobs in a cluster of workstations , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[27]  Dhabaleswar K. Panda,et al.  Application-bypass broadcast in MPICH over GM , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[28]  Kang G. Shin,et al.  Load Sharing in Distributed Real-Time Systems with State-Change Broadcasts , 1989, IEEE Trans. Computers.

[29]  Patrick Geoffray OPIOM: Off-Processor I/O with Myrinet , 2002, Future Gener. Comput. Syst..