Improving effective bandwidth of networks on clusters using load balancing for communication-intensive applications

Clusters have emerged as a primary and cost-effective infrastructure for parallel applications, including communication-intensive applications that transfer a large amount of data among nodes of a cluster via interconnection networks. Conventional load balancers have been proven effective in increasing utilization of CPU, memory, and disk I/O resources in a cluster. However, most of the existing load-balancing schemes ignore network resources, leaving open an opportunity for improving effective bandwidth of networks on clusters running parallel applications. For this reason, we propose a communication-aware load balancing technique that is capable of improving performance of communication-intensive applications by increasing effective utilization of networks in cluster environments. Our load-balancing scheme can make use of an application model to quickly and accurately determine the load induced by a variety of parallel applications. Simulation results on executing a wide range of parallel applications on a cluster show that the proposed scheme can significantly improve the performance in slowdown and turn-around time over three existing schemes by up to 206% (with an average of 74%) and 235% (with an average of 82%), respectively.

[1]  Sanjeev Setia,et al.  Availability and utility of idle memory in workstation clusters , 1999, SIGMETRICS '99.

[2]  Amnon Barak,et al.  The home model and competitive algorithms for load balancing in a computing cluster , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[3]  Kihong Park,et al.  Towards communication-sensitive load balancing , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[4]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[5]  Dhabaleswar K. Panda,et al.  High performance implementation of MPI derived datatype communication over InfiniBand , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[6]  Neil Spring,et al.  Application level scheduling of gene sequence comparison on metacomputers , 1998 .

[7]  José Duato,et al.  On the design of communication-aware task scheduling strategies for heterogeneous systems , 2000, Proceedings 2000 International Conference on Parallel Processing.

[8]  John A. Stankovic,et al.  Simulations of Three Adaptive, Decentralized Controlled, Job Scheduling Algorithms , 1984, Comput. Networks.

[9]  Xiao Qin,et al.  Dynamic Load Balancing for I/O-Intensive Tasks on Heterogeneous Clusters , 2003, HiPC.

[10]  Jane W.-S. Liu,et al.  Dynamic Load Balancing Algorithms in Homogeneous Distributed Systems , 1986, IEEE International Conference on Distributed Computing Systems.

[11]  Xiao Qin,et al.  Towards load balancing support for I/O-intensive parallel jobs in a cluster of workstations , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[12]  Dhabaleswar K. Panda,et al.  Application-bypass broadcast in MPICH over GM , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[13]  Patrick Geoffray OPIOM: Off-Processor I/O with Myrinet , 2002, Future Gener. Comput. Syst..

[14]  Rolf Riesen,et al.  Portals 3.0: protocol building blocks for low overhead communication , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[15]  Francine Berman,et al.  When the Herd Is Smart: Aggregate Behavior in the Selection of Job Request , 2003, IEEE Trans. Parallel Distributed Syst..

[16]  X. Qin Improving network performance through task duplication for parallel applications on clusters , 2005, PCCC 2005. 24th IEEE International Performance, Computing, and Communications Conference, 2005..

[17]  Wu-chun Feng,et al.  Optimizing 10-Gigabit Ethernet for Networks of Workstations, Clusters, and Grids: A Case Study , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[18]  Xiao Qin,et al.  Dynamic Load Balancing for I/O- and Memory-Intensive Workload in clusters Using a Feedback Control Mechanism , 2003, Euro-Par.

[19]  P. Messina,et al.  Architectural requirements of parallel scientific applications with explicit communication , 1993, ISCA '93.

[20]  Sanda M. Harabagiu,et al.  Performance analysis of a distributed question/answering system , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[21]  Li Xiao,et al.  Improving distributed workload performance by sharing both CPU and memory resources , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.