A Data Partitioning Model for Highly Heterogeneous Systems

Last generation supercomputers running bioinformatics workloads are composed of multiple heterogeneous processing units, requiring intelligent workload distribution. This paper describes an accurate static workload balancing model capable of (i) efficiently balancing the workload with no significant overhead because only a static light off-line profiling is required and (ii) deactivating slower devices. The effectiveness of the approach is experimentally validated using several representative bioinformatics workloads on three heterogeneous platforms.

[1]  Scott A. Mahlke,et al.  Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[2]  Antonio J. Plaza,et al.  Automatic tuning of iterative computation on heterogeneous multiprocessors with ADITHE , 2011, The Journal of Supercomputing.

[3]  Keshav Pingali,et al.  Adaptive heterogeneous scheduling for integrated GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[4]  Henk Corporaal,et al.  Demystifying the 16 × 16 thread‐block for stencils on the GPU , 2015, Concurr. Comput. Pract. Exp..

[5]  Ziming Zhong,et al.  Data Partitioning on Multicore and Multi-GPU Platforms Using Functional Performance Models , 2015, IEEE Transactions on Computers.

[6]  Inmaculada García,et al.  Multiprocessing of anisotropic nonlinear diffusion for filtering 3D images , 2006, 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP'06).

[7]  Laxmi N. Bhuyan,et al.  A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures , 2013, TACO.

[8]  José A. Martínez,et al.  Adaptive load balancing of iterative computation on heterogeneous nondedicated systems , 2011, The Journal of Supercomputing.

[9]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[10]  Siham Tabik,et al.  Performance evaluation of kernel fusion BLAS routines on the GPU: iterative solvers as case study , 2014, The Journal of Supercomputing.

[11]  Ming Wu,et al.  Algorithm-system scalability of heterogeneous computing , 2008, J. Parallel Distributed Comput..

[12]  Inmaculada García,et al.  High performance noise reduction for biomedical multidimensional data , 2007, Digit. Signal Process..

[13]  Siham Tabik,et al.  Implementation of Anisotropic Nonlinear Diffusion for Filtering 3D Images in Structural Biology on SMP Clusters , 2006 .

[14]  Rajkishore Barik,et al.  Efficient Mapping of Irregular C++ Applications to Integrated GPUs , 2014, CGO '14.

[15]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16]  Alexey Kalinov Scalability of heterogeneous parallel systems , 2006, Programming and Computer Software.

[17]  Ester M. Garzón,et al.  Improving the Performance of the Sparse Matrix Vector Product with GPUs , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.