A data driven CGRA Overlay Architecture with embedded processors

Implementing an algorithm on FPGA is intrinsically more difficult than programming a processor or a GPU. Processor-based implementations “only” require a program to control their pre-synthesized data path, while an FPGA requires that a designer creates a new data path and a new controller for each application. Several approaches have been proposed recently to ease FPGA design. The present work builds on Coarse-Grained Reconfigurable Architectures (CGRAs), and Overlay Architectures (OAs), that allow a designer to take advantage of a pre-compiled FPGA architecture and still provide a way to configure the system at a higher level. In the proposed architecture, a generic data-driven compute fabric is interfaced to standard processors. To validate the proposed architecture and design method, an illustrative example is developed in which a processor sends an RGB image to a processing fabric, where it is converted to Y, Cr, Cb. Results show that thanks to a DMA between the memory and the fabric, a speedup of 50 is reached compared to a pure software implementation running on a Microblaze processor.

[1]  Yvon Savaria,et al.  Mapping applications on two-level configurable hardware , 2015, 2015 NASA/ESA Conference on Adaptive Hardware and Systems (AHS).

[2]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[3]  Guy Lemieux,et al.  ZUMA: An Open FPGA Overlay Architecture , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[4]  Chen Chang,et al.  BEE3: Revitalizing Computer Architecture Research , 2009 .

[5]  Guy Lemieux,et al.  Rapid Overlay Builder for Xilinx FPGAs , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[6]  Wilson Jose,et al.  A Many-Core Co-Processor for Embedded Parallel Computing on FPGA , 2015, 2015 Euromicro Conference on Digital System Design.

[7]  James Coole,et al.  Intermediate Fabrics: Virtual Architectures for Near-Instant FPGA Compilation , 2011, IEEE Embedded Systems Letters.

[8]  Jörg Henkel,et al.  Floating point acceleration for stream processing applications in dynamically reconfigurable processors , 2015, 2015 13th IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia).

[9]  Marco Platzner,et al.  A Triple Hybrid Interconnect for Many-Cores: Reconfigurable Mesh, NoC and Barrier , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[10]  Heiner Giefers Reconfigurable many-cores with lean interconnect , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[11]  B. Ramakrishna Rau,et al.  Iterative Modulo Scheduling , 1996, International Journal of Parallel Programming.

[12]  Yvon Savaria,et al.  Two-level configuration for FPGA: A new design methodology based on a computing fabric , 2012, 2012 IEEE International Symposium on Circuits and Systems.

[13]  Dionisios N. Pnevmatikatos,et al.  Efficient runtime support for embedded MPSoCs , 2013, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[14]  Tarek S. Abdelrahman,et al.  A high-performance overlay architecture for pipelined execution of data flow graphs , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.