论文信息 - A data driven CGRA Overlay Architecture with embedded processors

A data driven CGRA Overlay Architecture with embedded processors

Implementing an algorithm on FPGA is intrinsically more difficult than programming a processor or a GPU. Processor-based implementations “only” require a program to control their pre-synthesized data path, while an FPGA requires that a designer creates a new data path and a new controller for each application. Several approaches have been proposed recently to ease FPGA design. The present work builds on Coarse-Grained Reconfigurable Architectures (CGRAs), and Overlay Architectures (OAs), that allow a designer to take advantage of a pre-compiled FPGA architecture and still provide a way to configure the system at a higher level. In the proposed architecture, a generic data-driven compute fabric is interfaced to standard processors. To validate the proposed architecture and design method, an illustrative example is developed in which a processor sends an RGB image to a processing fabric, where it is converted to Y, Cr, Cb. Results show that thanks to a DMA between the memory and the fabric, a speedup of 50 is reached compared to a pure software implementation running on a Microblaze processor.

Yvon Savaria | Himan Khanzadi | Jean-Pierre David

[1] Yvon Savaria,et al. Mapping applications on two-level configurable hardware , 2015, 2015 NASA/ESA Conference on Adaptive Hardware and Systems (AHS).

[2] B. Ramakrishna Rau,et al. Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[3] Guy Lemieux,et al. ZUMA: An Open FPGA Overlay Architecture , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[4] Chen Chang,et al. BEE3: Revitalizing Computer Architecture Research , 2009 .

[5] Guy Lemieux,et al. Rapid Overlay Builder for Xilinx FPGAs , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[6] Wilson Jose,et al. A Many-Core Co-Processor for Embedded Parallel Computing on FPGA , 2015, 2015 Euromicro Conference on Digital System Design.

[7] James Coole,et al. Intermediate Fabrics: Virtual Architectures for Near-Instant FPGA Compilation , 2011, IEEE Embedded Systems Letters.

[8] Jörg Henkel,et al. Floating point acceleration for stream processing applications in dynamically reconfigurable processors , 2015, 2015 13th IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia).

[9] Marco Platzner,et al. A Triple Hybrid Interconnect for Many-Cores: Reconfigurable Mesh, NoC and Barrier , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[10] Heiner Giefers. Reconfigurable many-cores with lean interconnect , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[11] B. Ramakrishna Rau,et al. Iterative Modulo Scheduling , 1996, International Journal of Parallel Programming.

[12] Yvon Savaria,et al. Two-level configuration for FPGA: A new design methodology based on a computing fabric , 2012, 2012 IEEE International Symposium on Circuits and Systems.

[13] Dionisios N. Pnevmatikatos,et al. Efficient runtime support for embedded MPSoCs , 2013, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[14] Tarek S. Abdelrahman,et al. A high-performance overlay architecture for pipelined execution of data flow graphs , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.