OpenCL for FPGAs: Prototyping a Compiler

Hardware acceleration using FPGAs has shown orders of magnitude reduction in runtime of computationally-intensive applications in comparison to traditional stand-alone computers [1]. This is possible because on an FPGA many computations can be performed at the same time in a trulyparallel fashion. However, parallel computation at a hardware level requires a great deal of expertise, which limits the adoption of FPGA-based acceleration platforms. A recent interest to enable software programmers to use GPUs for general-purpose computing has spawned an interest in developing languages for this purpose. OpenCL is one such language that enables a programmer to specify parallelism at a high level and put together an application that can take advantage of low-level hardware acceleration. In this paper, we present a framework to support OpenCL compilation to FPGAs. We begin with two case studies that show how an OpenCL compilation could be done by hand to motivate our work. We discuss how these case studies influenced the inception of an OpenCL compiler for FPGAs. We then present the compilation flow and the results on a set of benchmarks that show the effectiveness of our automated compiler. We compare our work to prior art and show that using OpenCL as a system design language enables large scale design of high-performance computing applications.

[1]  Viktor K. Prasanna,et al.  Analysis of high-performance floating-point arithmetic on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[2]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[3]  Lili Su,et al.  FPGA-Accelerated Molecular Dynamics Simulations System , 2009, 2009 International Conference on Scalable Computing and Communications; Eighth International Conference on Embedded Computing.

[4]  Walid A. Najjar,et al.  Automatic Compilation Framework for Bloom Filter Based Intrusion Detection , 2006, ARC.

[5]  Jason Cong,et al.  FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs , 2009, 2009 IEEE 7th Symposium on Application Specific Processors.

[6]  Kazutoshi Wakabayashi C-based behavioral synthesis and verification analysis on industrial design examples , 2004 .

[7]  Jordi Cortadella,et al.  Synthesis of synchronous elastic architectures , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[8]  Jordi Cortadella,et al.  Synchronous Elastic Circuits with Early Evaluation and Token Counterflow , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[9]  Viktor K. Prasanna,et al.  High Performance Dictionary-Based String Matching for Deep Packet Inspection , 2010, 2010 Proceedings IEEE INFOCOM.

[10]  Eric L. Miller,et al.  Parallel-Beam Backprojection: An FPGA Implementation Optimized for Medical Imaging , 2005, J. VLSI Signal Process..

[11]  Muhsen Owaida,et al.  Synthesis of Platform Architectures from OpenCL Programs , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[12]  Steven J. E. Wilton,et al.  Activity Estimation for Field-Programmable Gate Arrays , 2006, 2006 International Conference on Field Programmable Logic and Applications.

[13]  Sadaf R. Alam,et al.  Scientific Computing Beyond CPUs: FPGA implementations of common scientific kernels , 2005 .

[14]  Siddharth Joshi,et al.  FPGA Based High Performance Double-Precision Matrix Multiplication , 2009, VLSI Design.

[15]  Yong Dou,et al.  64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.

[16]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.