Contention-Free Many-to-Many Communication Scheduling for High Performance Clusters

In the context of generating efficient, contention free schedules for inter-node communication through a switch fabric in cluster computing or data center type environments, all-to-all scheduling with equal sized data transfer requests has been studied in the literature [1, 3, 4]. In this paper, we propose a communication scheduling module (CSM) towards generating contention free communication schedules for many-to-many communication with arbitrary sized data. Towards this end, we propose three approximation algorithms - PST, LDT and SDT. From time to time, the CSM first generates a bipartite graph from the set of received requests, then determines which of these three algorithms gives the best approximation factor on this graph and finally executes that algorithm to generate a contention free schedule. Algorithm PST has a worst case run time of O(max (Δ|E|, |E| log (|E|))) and guarantees an approximation factor of 2H2Δ-1, where |E| is the number of edges in the bipartite graph, Δ is the maximum node degree of the bipartite graph and H2Δ-1 is the (2Δ - 1)- th harmonic number. LDT runs in O(|E|2) and has an approximation factor of 2(1 + τ), where τ is a constant defined as a guard band or pause time to eliminate the possibility of contention (in an apparently contention free schedule) caused by system jitter and synchronization inaccuracies between the nodes. SDT gives an approximation factor of 4 log (wmax) and has a worst case run time of O(Δ|E| log (wmax)), where wmax represents the longest communication time in a set of received requests.

[1]  Marcos K. Aguilera,et al.  Distributed Computing and Networking , 2011, Lecture Notes in Computer Science.

[2]  Richard Cole,et al.  Edge-Coloring Bipartite Multigraphs in O(E logD) Time , 1999, Comb..

[3]  Ashish Goel,et al.  Efficient, Fully Local Algorithms for CIOQ Switches , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[4]  Devavrat Shah,et al.  Switch scheduling via randomized edge coloring , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[5]  Yuanyuan Yang,et al.  Optimal All-to-All Personalized Exchange in Self-Routable Multistage Networks , 2000, IEEE Trans. Parallel Distributed Syst..

[6]  Xin Yuan,et al.  Bandwidth Efficient All-to-All Broadcast on Switched Clusters , 2005, 2005 IEEE International Conference on Cluster Computing.

[7]  Cho-Li Wang,et al.  Contention-Aware Communication Schedule for High-Speed Communication , 2003, Cluster Computing.

[8]  Peter Sanders,et al.  An asymptotic approximation scheme for multigraph edge coloring , 2005, SODA '05.

[9]  Peter Luksch,et al.  MethWerk: Scalable Mesh-based Simulation on Clusters of SMPs , 2005 .

[10]  Martin Berzins,et al.  A comparison of some dynamic load-balancing algorithms for a parallel adaptive flow solver , 2000, Parallel Comput..

[11]  Terry Clyde Wilcox Dynamic Load Balancing of Virtual Machines Hosted on Xen , 2008 .

[12]  Koushik Sinha,et al.  Efficient Load Balancing on a Cluster for Large Scale Online Video Surveillance , 2009, ICDCN.

[13]  Magnus Jonsson,et al.  Efficient many-to-many real-time communication using an intelligent Ethernet switch , 2004, 7th International Symposium on Parallel Architectures, Algorithms and Networks, 2004. Proceedings..