Heterogeneous computing with accelerators: an overview with examples

Accelerator-based platforms are heterogeneous in nature, yet most applications avoid heterogeneity, and focus on acceleration alone. Platform-level heterogeneity can bring significant performance improvement, as it essentially means using additional resources for the same computation. But is the performance gained using these additional resources worth the effort to program and deploy heterogeneous applications? In this work, we present a taxonomy of the existing programming models and tools available for heterogeneous computing with accelerators, and give examples of systems fitting different classes. We further provide guidelines for efficiently navigating this landscape in the search for a suitable tool for designing and deploying a new application.

[1]  Veljko Milutinovic,et al.  A survey of heterogeneous computing: concepts and systems , 1996, Proc. IEEE.

[2]  Michael F. P. O'Boyle,et al.  OpenCL Task Partitioning in the Presence of GPU Contention , 2013, LCPC.

[3]  Jie Shen,et al.  An application-centric evaluation of OpenCL on multi-core CPUs , 2013, Parallel Comput..

[4]  Jack J. Dongarra,et al.  Towards dense linear algebra for hybrid GPU accelerated manycore systems , 2009, Parallel Comput..

[5]  Kevin Skadron,et al.  Load balancing in a changing world: dealing with heterogeneity and performance variability , 2013, CF '13.

[6]  Basilio B. Fraguela,et al.  Exploiting heterogeneous parallelism with the Heterogeneous Programming Library , 2013, J. Parallel Distributed Comput..

[7]  Jaejin Lee,et al.  Performance characterization of the NAS Parallel Benchmarks in OpenCL , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[8]  Alejandro Duran,et al.  Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..

[9]  Jeffrey S. Vetter,et al.  A Survey of CPU-GPU Heterogeneous Computing Techniques , 2015, ACM Comput. Surv..

[10]  Thomas Fahringer,et al.  An automatic input-sensitive approach for heterogeneous task partitioning , 2013, ICS '13.

[11]  Jerónimo Castrillón Mazo Programming heterogeneous MPSoCs: tool flows to close the software productivity gap , 2013 .

[12]  Henri E. Bal,et al.  Glasswing: accelerating mapreduce on multi-core and many-core clusters , 2014, HPDC '14.

[13]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[14]  Matei Ripeanu,et al.  A yoke of oxen and a thousand chickens for heavy lifting graph processing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[15]  Jie Shen,et al.  Glinda: a framework for accelerating imbalanced applications on heterogeneous platforms , 2013, CF '13.

[16]  Jie Shen,et al.  Look before You Leap: Using the Right Hardware Resources to Accelerate Applications , 2014, 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS).

[17]  Gagan Agrawal,et al.  Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations , 2010, ICS '10.

[18]  Cees T. A. M. de Laat,et al.  The landscape of GPGPU performance modeling tools , 2016, Parallel Comput..

[19]  Teresa H. Y. Meng,et al.  Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.

[20]  Michael F. P. O'Boyle,et al.  A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL , 2011, CC.

[21]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[22]  Scott A. Mahlke,et al.  Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[23]  Henri E. Bal,et al.  Glasswing: Scalable MapReduce for Modern Multi-core and Many-core Clusters , 2014 .

[24]  Rainer Leupers,et al.  Programming Heterogeneous MPSoCs , 2014 .

[25]  Jie Shen,et al.  Matchmaking Applications and Partitioning Strategies for Efficient Execution on Heterogeneous Platforms , 2015, 2015 44th International Conference on Parallel Processing.

[26]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).