An OpenCL optimizing compiler for reconfigurable processors

This paper presents simple and efficient optimization techniques for an OpenCL compiler that targets reconfigurable processors. The target architecture consists of a generalpurpose processor core and an embedded reconfigurable accelerator with vector units. The accelerator is able to switch its architecture between the VLIW mode and the Coarse Grained Reconfigurable Array (CGRA) mode to achieve high performance. One big problem of this architecture is programming difficulty and OpenCL can be a good solution. However, since OpenCL does not guarantee performance portability, hardware dependent optimization is still necessary. Hence, we develop an OpenCL compiler framework that exploits the mode switching capability and vector units. To measure the effectiveness of the techniques, we have implemented the OpenCL framework and evaluate their performance with fourteen OpenCL benchmark applications.

[1]  Sebastian Hack,et al.  Whole-function vectorization , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[2]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[3]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[4]  Rudy Lauwereins,et al.  Design methodology for a tightly coupled VLIW/reconfigurable matrix architecture: a case study , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[5]  Jungwon Kim,et al.  SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters , 2012, ICS '12.

[6]  Vikram S. Adve,et al.  The LLVM Compiler Framework and Infrastructure Tutorial , 2004, LCPC.

[7]  Pankaj Shailendra Gode,et al.  Function inlining and loop unrolling for loop acceleration in reconfigurable processors , 2012, CASES '12.

[8]  Rudy Lauwereins,et al.  Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling , 2003, DATE.

[9]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[10]  Richard M. Stallman,et al.  Using the GNU Compiler Collection , 2010 .

[11]  M. Schlansker,et al.  On Predicated Execution , 1991 .

[12]  Jong-Deok Choi,et al.  An OpenCL framework for heterogeneous multicores with local memory , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[13]  Fadi J. Kurdahi,et al.  The MorphoSys Parallel Reconfigurable System , 1999, Euro-Par.

[14]  Rudy Lauwereins,et al.  Architecture exploration for a reconfigurable architecture template , 2005, IEEE Design & Test of Computers.

[15]  Chi-Bang Kuan,et al.  Enabling an OpenCL Compiler for Embedded Multicore DSP Systems , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[16]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[17]  Hee-Seok Kim,et al.  Design evaluation of OpenCL compiler framework for Coarse-Grained Reconfigurable Arrays , 2012, 2012 International Conference on Field-Programmable Technology.

[18]  Monica S. Lam,et al.  RETROSPECTIVE : Software Pipelining : An Effective Scheduling Technique for VLIW Machines , 1998 .

[19]  Ralf S. Engelschall Portable Multithreading-The Signal Stack Trick for User-Space Thread Creation , 2000, USENIX Annual Technical Conference, General Track.