Macah: A "C-Level" Language for Programming Kernels on Coprocessor Accelerators

Coprocessor accelerator architectures like FPGAs and GPUs are increasingly used in embedded systems because of their high performance on computation-heavy inner loops of a variety of applications. However, current languages and compilers for these architectures make it challenging to efficiently implement kernels that have complex, input-dependent control flow and data access patterns. In this paper we argue that providing language support for such kernels significantly broadens the applicability of accelerator architectures. We then describe a new language‐called Macah‐and compiler that provide this support. Macah is a “C-level” language, in the sense that it forces programmers to think about some of the abstract architectural characteristics that make accelerators different from conventional processors. However, the compiler still fills in several important architecture-specific details, so that programming in Macah is substantially easier than using hardware description languages or coprocessor-specific assembly languages. We have implemented a prototype Macah compiler that produces simulatable Verilog which represents the input program mapped onto a model of an accelerator. Among other applications, we have programmed a complex kernel taken from video compression in Macah and have it running in simulation.

[1]  Steve Johnson,et al.  Compiling C for vectorization, parallelization, and inline expansion , 1988, PLDI '88.

[2]  David Pellerin,et al.  Practical FPGA programming in C , 2005 .

[3]  William J. Dally,et al.  Programmable Stream Processors , 2003, Computer.

[4]  David A. Padua,et al.  In search of a program generator to implement generic transformations for high-performance computing , 2006, Sci. Comput. Program..

[5]  William Thies,et al.  Optimizing stream programs using linear state space analysis , 2005, CASES '05.

[6]  Lai-Man Po,et al.  Enhanced hexagonal search for fast block motion estimation , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Constantine D. Polychronopoulos Loop Coalesing: A Compiler Transformation for Parallel Machines , 1987, ICPP.

[8]  George C. Necula,et al.  CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs , 2002, CC.

[9]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[10]  Maya Gokhale,et al.  Stream-oriented FPGA computing in the Streams-C high level language , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[11]  Ken Kennedy,et al.  Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.

[12]  Carl Ebeling,et al.  PathFinder: A Negotiation-Based Performance-Driven Router for FPGAs , 1995, Third International ACM Symposium on Field-Programmable Gate Arrays.

[13]  Scott A. Mahlke,et al.  Streamroller:: automatic synthesis of prescribed throughput accelerator pipelines , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).

[14]  Allan L. Fisher,et al.  Flattening and parallelizing irregular, recurrent loop nests , 1995, PPOPP '95.

[15]  Carl Ebeling,et al.  A Type Architecture for Hybrid Micro-Parallel Computers , 2006, FCCM.

[16]  Bradford L. Chamberlain,et al.  The case for high-level parallel programming in ZPL , 1998 .

[17]  Joseph A. Fisher,et al.  Clustered Instruction-Level Parallel Processors , 1998 .

[18]  William R. Mark,et al.  Cg: a system for programming graphics hardware in a C-like language , 2003, ACM Trans. Graph..

[19]  Maya Gokhale,et al.  NAPA C: compiling for a hybrid RISC/FPGA architecture , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[20]  Carl Ebeling,et al.  QuickRoute: a fast routing algorithm for pipelined architectures , 2004, Proceedings. 2004 IEEE International Conference on Field- Programmable Technology (IEEE Cat. No.04EX921).