A Coarse-Grain FPGA Overlay for Executing Data Flow Graphs

We explore the feasibility of using a coarse-grain overlay to transparently and dynamically accelerate the execution of hot segments of code that run on soft processors. The overlay, referred to as the Virtual Dynamically Reconfigurable (VDR), is tuned to realize data flow graphs in which nodes are machine instructions and the edges are inter-instruction dependences. A VDR consists of an array of functional units that are interconnected by a set of programmable switches. It can be rapidly configured by the soft processor at run-time to implement a given data flow graph. The use of a VDR overcomes two key challenges with run-time translation of code into circuits: the prohibitive compile time of standard synthesis tools and the limited run-time reconfigurability of commodity FPGAs. We conduct a preliminary evaluation that shows that the execution of a benchmark can be sped up by up to 9X over a Nios II processor using a benchmark-specific VDR overlay. The overlay incurs a 6.4X penalty in resources compared to Nios II. This work is a resubmission of earlier work that appeared in FCCM 2011 [1]. Keywords-Overlay architectures; dynamic acceleration of soft processors; just-in-time compilation

[1]  Tarek S. Abdelrahman,et al.  A Characterization of Traces in Java Programs , 2005, PLC.

[2]  Alex K. Jones,et al.  An FPGA-based VLIW processor with custom hardware execution , 2005, FPGA '05.

[3]  Jonathan Rose,et al.  Application-specific customization of soft processor microarchitecture , 2006, FPGA '06.

[4]  Nachiket Kapre,et al.  Packet Switched vs. Time Multiplexed FPGA Overlay Networks , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[5]  Neil W. Bergmann,et al.  QUKU: A FPGA Based Flexible Coarse Grain Architecture Design Paradigm using Process Networks , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[6]  Tarek S. Abdelrahman,et al.  Automatic Trace-Based Parallelization of Java Programs , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[7]  J. Gregory Steffan,et al.  Improving Pipelined Soft Processors with Multithreading , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[8]  Marc Feeley,et al.  Hardware JIT Compilation for Off-the-Shelf Dynamically Reconfigurable FPGAs , 2008, CC.

[9]  Rolf Ernst,et al.  Application development with the FlexWAFE real-time stream processing architecture for FPGAs , 2009, TECS.

[10]  Stephen Dean Brown,et al.  Enhancements to FPGA design methodology using streaming , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[11]  Gary Smith,et al.  High-Level Synthesis: Past, Present, and Future , 2009, IEEE Design & Test of Computers.

[12]  Russell Tessier,et al.  Application Specific Customization and Scalability of Soft Multiprocessors , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[13]  Jonathan Rose,et al.  Fine-grain performance scaling of soft vector processors , 2009, CASES '09.

[14]  James Coole,et al.  Intermediate fabrics: Virtual architectures for circuit portability and fast placement and routing , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[15]  Guy Lemieux,et al.  A CAD framework for Malibu: an FPGA with time-multiplexed coarse-grained elements , 2011, FPGA '11.

[16]  Tarek S. Abdelrahman,et al.  Towards Synthesis-Free JIT Compilation to Commodity FPGAs , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.