Offload - Automating Code Migration to Heterogeneous Multicore Systems

We present Offload, a programming model for offloading parts of a C++ application to run on accelerator cores in a heterogeneous multicore system. Code to be offloaded is enclosed in an offload scope; all functions called indirectly from an offload scope are compiled for the accelerator cores. Data defined inside/outside an offload scope resides in accelerator/host memory respectively, and code to move data between memory spaces is generated automatically by the compiler. This is achieved by distinguishing between host and accelerator pointers at the type level, and compiling multiple versions of functions based on pointer parameter configurations using automatic call-graph duplication. We discuss solutions to several challenging issues related to call-graph duplication, and present an implementation of Offload for the Cell BE processor, evaluated using a number of benchmarks.

[1]  Eric Hoines,et al.  A Proposal for Standard Graphics Environments , 1987, IEEE Computer Graphics and Applications.

[2]  Ken Kennedy,et al.  A Methodology for Procedure Cloning , 1993, Computer languages.

[3]  Katherine A. Yelick,et al.  Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..

[4]  Alan Mycroft,et al.  Revisiting SIMD Programming , 2007, LCPC.

[5]  Katherine Yelick,et al.  Titanium: a high-performance Java dialect , 1998 .

[6]  Dibyendu Das Optimizing subroutines with optional parameters in F90 via function cloning , 2006, SIGP.

[7]  Tarek A. El-Ghazawi,et al.  An evaluation of global address space languages: co-array fortran and unified parallel C , 2005, PPoPP.

[8]  Aart J. C. Bik,et al.  A Case Study on Compiler Optimizations for the Intel® CoreTM 2 Duo Processor , 2008, International Journal of Parallel Programming.

[9]  Robert Metzger,et al.  Interprocedural constant propagation: an empirical study , 1993, LOPL.

[10]  Rosa M. Badia,et al.  CellSs: a Programming Model for the Cell BE Architecture , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[11]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[12]  P. Hanrahan,et al.  Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[13]  H. Peter Hofstee,et al.  Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.

[14]  TianXinmin,et al.  A case study on compiler optimizations for the Intel® Core™ 2 duo processor , 2008 .