Task Scheduling Greedy Heuristics for GPU Heterogeneous Cluster Involving the Weights of the Processor

Modern GPUs are gradually used by more and more cluster computing systems as the high performance computing units due to their outstanding computational power, whereas bringing system-level (among different nodes) architectural heterogeneity to cluster. In this paper, based on MPI and CUDA programming model, we aim to investigate task scheduling for GPU heterogeneous cluster by taking into account the system-level heterogeneous characteristics and also involving the weights of the processor (both CPUs and GPUs). At first, based on our GPU heterogeneous cluster, we classify executing tasks to six major classifications according to their parallelism degrees, input data sizes, and processing workloads. Then, aiming to realize the approximately optimal mapping between tasks and computing resources, a task scheduling strategy is presented. In this paper, we present the WSLSA greedy heuristic which can involve the weights of the processor. Besides, we also define two measurement factors for the task assignments. One is the maximum value of total workloads for all task assignments to consider the maximum workloads for the GPU heterogeneity cluster. The other is the distribution of task assignments which can determine the load balance of the task assignments for the GPU heterogeneity cluster. The other is the distribution of task assignments which can determine the load balance of the task assignments for the GPU heterogeneity cluster.

[1]  Federico Silla,et al.  Performance of CUDA Virtualized Remote GPUs in High Performance Clusters , 2011, 2011 International Conference on Parallel Processing.

[2]  Vikram K. Narayana,et al.  Scaling scientific applications on clusters of hybrid multicore/GPU nodes , 2011, CF '11.

[3]  Baifeng Wu,et al.  Task Scheduling for GPU Heterogeneous Cluster , 2012, 2012 IEEE International Conference on Cluster Computing Workshops.

[4]  Norbert Luttenberger,et al.  Efficiently Using a CUDA-enabled GPU as Shared Resource , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[5]  William J. Dally,et al.  GPUs and the Future of Parallel Computing , 2011, IEEE Micro.

[6]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[7]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[8]  William Gropp,et al.  EcoG: A Power-Efficient GPU Cluster Architecture for Scientific Computing , 2011, Computing in Science & Engineering.

[9]  Yao Zhang,et al.  Fast tridiagonal solvers on the GPU , 2010, PPoPP '10.

[10]  Pedro Valero-Lara,et al.  Towards a More Efficient Use of GPUs , 2011, 2011 International Conference on Computational Science and Its Applications.

[11]  William Gropp,et al.  An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.

[12]  Ulf Assarsson,et al.  Fast parallel GPU-sorting using a hybrid algorithm , 2008, J. Parallel Distributed Comput..

[13]  John R. Gilbert,et al.  Solving path problems on the GPU , 2010, Parallel Comput..

[14]  Baifeng Wu,et al.  Parallel Sparse Matrix Multiplication for Preconditioning and SSTA on a Many-Core Architecture , 2012, 2012 IEEE Seventh International Conference on Networking, Architecture, and Storage.

[15]  Yao Zhang,et al.  Scan primitives for GPU computing , 2007, GH '07.

[16]  Baifeng Wu,et al.  GPU accelerate parallel Odd-Even merge sort: An OpenCL method , 2011, Proceedings of the 2011 15th International Conference on Computer Supported Cooperative Work in Design (CSCWD).

[17]  Baifeng Wu,et al.  A Novel Parallel Approach of Radix Sort with Bucket Partition Preprocess , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.

[18]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[19]  John E. Stone,et al.  GPU clusters for high-performance computing , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[20]  Arie E. Kaufman,et al.  GPU Cluster for High Performance Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[21]  Weiguo Liu,et al.  Streaming Algorithms for Biological Sequence Alignment on GPUs , 2007, IEEE Transactions on Parallel and Distributed Systems.

[22]  Zhuo Feng,et al.  Multigrid on GPU: Tackling Power Grid Analysis on parallel SIMT platforms , 2008, 2008 IEEE/ACM International Conference on Computer-Aided Design.

[23]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.