Optimus: efficient realization of streaming applications on FPGAs

In this paper, we introduce Optimus: an optimizing synthesis compiler for streaming applications. Optimus compiles programs written in a high level streaming language to either software or hardware implementations. The compiler uses a hierarchical compilation strategy that separates concerns between macro- and micro-functional requirements. Macro-functional concerns address how components (modules) are assembled to implement larger more complex applications. Micro-functional issues deal with synthesis issues of the module internals. Optimus thus allows software developers who lack deep hardware design expertise to transparently leverage the advantages of hardware customization without crossing the semantic gap between high level languages and hardware description languages. Optimus generates streaming hardware that achieves on average 40x speedup over our baseline embedded processor for a fraction of the energy. Additionally, our results show that streaming-specific optimizations can further improve performance by 255% and reduce the area requirements by 16% in average. These designs are competitive with Handel-C implementations for some of the same benchmarks.

[1]  William R. Mark,et al.  Cg: a system for programming graphics hardware in a C-like language , 2003, ACM Trans. Graph..

[2]  Henry Hoffmann,et al.  Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[3]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[4]  Hong Song,et al.  A Programming Model for an Embedded Media Processing Architecture , 2005, SAMOS.

[5]  Kees A. Vissers,et al.  Optimized generation of data-path from C codes for FPGAs , 2005, Design, Automation and Test in Europe.

[6]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[7]  John J. Granacki,et al.  DEFACTO: A Design Environment for Adaptive Computing Technology , 1999, IPPS/SPDP Workshops.

[8]  John Wawrzynek,et al.  Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[9]  Scott A. Mahlke,et al.  Orchestrating the execution of stream programs on multicore platforms , 2008, PLDI '08.

[10]  Frank Vahid,et al.  C is for circuits: capturing FPGA circuits as sequential code for portability , 2008, FPGA '08.

[11]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[12]  Michael J. Flynn,et al.  StReAm: Object-Oriented Programming of Stream Architectures Using PAM-Blox , 2000, FPL.

[13]  Michael K. Chen,et al.  Shangri-La: achieving high performance from compiled network applications while enabling ease of programming , 2005, PLDI '05.

[14]  Michael I. Gordon,et al.  Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.

[15]  David Kirk,et al.  NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.

[16]  Calton Pu,et al.  Spidle: A DSL Approach to Specifying Streaming Applications , 2003, GPCE.

[17]  Nikil D. Dutt,et al.  SPARK: a high-level synthesis framework for applying parallelizing compiler transformations , 2003, 16th International Conference on VLSI Design, 2003. Proceedings..

[18]  Bruce A. Draper,et al.  High-Level Language Abstraction for Reconfigurable Computing , 2003, Computer.