Time-Multiplexed FPGA Overlay Architectures

This article presents a comprehensive survey of time-multiplexed (TM) FPGA overlays from the research literature. These overlays are categorized based on their implementation into two groups: processor-based overlays, as their implementation follows that of conventional silicon-based microprocessors, and; CGRA-like overlays, with either an array of interconnected processor-based functional units or medium-grained arithmetic functional units. Time-multiplexing the overlay allows it to change its behavior with a cycle-by-cycle execution of the application kernel, thus allowing better sharing of the limited FPGA hardware resource. However, most TM overlays suffer from large resource overheads, due to either the underlying processor-like architecture (for processor-based overlays) or due to the routing array and instruction storage requirements (for CGRA-like overlays). Reducing the area overhead for CGRA-like overlays, specifically that required for the routing network, and better utilizing the hard macros in the target FPGA are active areas of research.

[1]  Douglas L. Maskell,et al.  Virtualized Execution and Management of Hardware Tasks on a Hybrid ARM-FPGA Platform , 2014, J. Signal Process. Syst..

[2]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[3]  Wayne Luk,et al.  Application-specific customisation of multi-threaded soft processors , 2006 .

[4]  Stephen Dean Brown,et al.  Experiences with soft-core processor design , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[5]  Douglas L. Maskell,et al.  DeCO: A DSP Block Based FPGA Accelerator Overlay with Low Overhead Interconnect , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[6]  Jonathan Rose,et al.  Application-specific customization of soft processor microarchitecture , 2006, FPGA '06.

[7]  Guy Lemieux,et al.  ZUMA: An Open FPGA Overlay Architecture , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[8]  Russell Tessier,et al.  FlexGrip: A soft GPGPU for FPGAs , 2013, 2013 International Conference on Field-Programmable Technology (FPT).

[9]  René van Leuken,et al.  MB-LITE: A robust, light-weight soft-core implementation of the MicroBlaze architecture , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[10]  John Freeman,et al.  From opencl to high-performance hardware on FPGAS , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[11]  Guy Lemieux,et al.  Vector processing as a soft-core CPU accelerator , 2008, FPGA '08.

[12]  Nachiket Kapre,et al.  120-core microAptiv MIPS Overlay for the Terasic DE5-NET FPGA board , 2017, FPGA.

[13]  Mazen A. R. Saghir,et al.  Supporting multithreading in configurable soft processor cores , 2007, CASES '07.

[14]  Cheng Liu,et al.  A Soft Coarse-Grained Reconfigurable Array Based High-level Synthesis Methodology: Promoting Design Productivity and Exploring Extreme FPGA Frequency , 2013, 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines.

[15]  Rudy Lauwereins,et al.  ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix , 2003, FPL.

[16]  Dirk Koch,et al.  FPGAs for Software Programmers , 2016 .

[17]  Pedro Tomás,et al.  SCRATCH: An End-to-End Application-Aware So-GPGPU Architecture and Trimming Tool , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[18]  Stephen Dean Brown,et al.  A Multithreaded Soft Processor for SoPC Area Reduction , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[19]  J. Gregory Steffan,et al.  OCTAVO: an FPGA-centric processor family , 2012, FPGA '12.

[20]  J. Gregory Steffan,et al.  Scaling Soft Processor Systems , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.

[21]  Reiner W. Hartenstein,et al.  Coarse grain reconfigurable architecture (embedded tutorial) , 2001, ASP-DAC '01.

[22]  Tarek S. Abdelrahman,et al.  Towards Synthesis-Free JIT Compilation to Commodity FPGAs , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[23]  Suhaib A. Fahmy,et al.  On Data Forwarding in Deeply Pipelined Soft Processors , 2015, FPGA.

[24]  Fei Wang,et al.  A survey of open source processors for FPGAs , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[25]  Karthikeyan Sankaralingam,et al.  Design, integration and implementation of the DySER hardware accelerator into OpenSPARC , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[26]  Jonathan Rose,et al.  Fine-grain performance scaling of soft vector processors , 2009, CASES '09.

[27]  Guy Lemieux,et al.  VEGAS: soft vector processor with scratchpad memory , 2011, FPGA '11.

[28]  Yunsup Lee,et al.  The RISC-V Instruction Set Manual , 2014 .

[29]  Jonathan Rose,et al.  VESPA: portable, scalable, and flexible FPGA-based vector processors , 2008, CASES '08.

[30]  Cheng Liu,et al.  QuickDough: A rapid FPGA loop accelerator design framework using soft CGRA overlay , 2015, 2015 International Conference on Field Programmable Technology (FPT).

[31]  Alexander Dunlop Brant,et al.  Coarse and fine grain programmable overlay architectures for FPGAs , 2013 .

[32]  Rafat Rashid,et al.  A Dual-engine Fetch/Compute Overlay Processor for FPGAs , 2015 .

[33]  Alex K. Jones,et al.  An FPGA-based VLIW processor with custom hardware execution , 2005, FPGA '05.

[34]  Guy Lemieux,et al.  An efficient FPGA overlay for portable custom instruction set extensions , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[35]  Karthikeyan Sankaralingam,et al.  MIAOW - An open source RTL implementation of a GPGPU , 2015, 2015 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XVIII).

[36]  Nachiket Kapre,et al.  Packet Switched vs. Time Multiplexed FPGA Overlay Networks , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[37]  J. Gregory Steffan,et al.  The microarchitecture of FPGA-based soft processors , 2005, CASES '05.

[38]  Vaughn Betz,et al.  Comparing performance, productivity and scalability of the TILT overlay processor to OpenCL HLS , 2014, 2014 International Conference on Field-Programmable Technology (FPT).

[39]  Aaron Severance,et al.  Broadening the applicability of FPGA-based soft vector processors , 2015 .

[40]  Nachiket Kapre,et al.  Comparing soft and hard vector processing in FPGA-based embedded systems , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[41]  J. Gregory Steffan,et al.  Improving Pipelined Soft Processors with Multithreading , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[42]  Douglas L. Maskell,et al.  iDEA: A DSP block based FPGA soft processor , 2012, 2012 International Conference on Field-Programmable Technology.

[43]  Elias Vansteenkiste,et al.  Efficient implementation of Virtual Coarse Grained Reconfigurable Arrays on FPGAS , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[44]  Douglas L. Maskell,et al.  Throughput oriented FPGA overlays using DSP blocks , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[45]  Neil W. Bergmann,et al.  QUKU: A Coarse Grained Paradigm for FPGAs , 2006, Dynamically Reconfigurable Architectures.

[46]  Xiang Zou,et al.  Intel nehalem processor core made FPGA synthesizable , 2010, FPGA.

[47]  Guy Lemieux,et al.  Vector Processing as a Soft Processor Accelerator , 2009, TRETS.

[48]  Guangming Lu,et al.  MorphoSys: a reconfigurable architecture for multimedia applications , 1998, Proceedings. XI Brazilian Symposium on Integrated Circuit Design (Cat. No.98EX216).

[49]  Tom Feist,et al.  Vivado Design Suite , 2012 .

[50]  Srinivas Devadas,et al.  Heracles: Fully Synthesizable Parameterized MIPS-Based Multicore System , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.

[51]  Karthikeyan Sankaralingam,et al.  Dynamically Specialized Datapaths for energy efficient computing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[52]  Douglas L. Maskell,et al.  A time-multiplexed FPGA overlay with linear interconnect , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[53]  Brent Nelson,et al.  PNoC: a flexible circuit-switched NoC for FPGA-based systems , 2006 .

[54]  Onur Mutlu,et al.  A case for bufferless routing in on-chip networks , 2009, ISCA '09.

[55]  Tarek S. Abdelrahman,et al.  A high-performance overlay architecture for pipelined execution of data flow graphs , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[56]  Charles Eric LaForest High-speed soft-processor architecture for FPGA overlays , 2015 .

[57]  Douglas L. Maskell,et al.  The iDEA DSP Block-Based Soft Processor for FPGAs , 2014, TRETS.

[58]  Robert Owen,et al.  MIPSfpga: Hands-on learning on a commercial soft-core , 2016, 2016 11th European Workshop on Microelectronics Education (EWME).

[59]  Nachiket Kapre,et al.  Hoplite: Building austere overlay NoCs for FPGAs , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[60]  Douglas L. Maskell,et al.  An Area-Efficient FPGA Overlay using DSP Block based Time-multiplexed Functional Units , 2016, ArXiv.

[61]  Guy Lemieux,et al.  Embedded supercomputing in FPGAs with the VectorBlox MXP Matrix Processor , 2013, 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[62]  James Coole,et al.  Intermediate fabrics: Virtual architectures for circuit portability and fast placement and routing , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[63]  Reiner W. Hartenstein Coarse grain reconfigurable architectures , 2001, Proceedings of the ASP-DAC 2001. Asia and South Pacific Design Automation Conference 2001 (Cat. No.01EX455).

[64]  Christoforos E. Kozyrakis,et al.  Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks , 2002, MICRO.

[65]  Wayne Luk,et al.  CUSTARD - a customisable threaded FPGA soft processor and tools , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[66]  Jason Helge Anderson,et al.  LegUp: high-level synthesis for FPGA-based processor/accelerator systems , 2011, FPGA '11.

[67]  Michael Hübner,et al.  FGPU: An SIMT-Architecture for FPGAs , 2016, FPGA.

[68]  Reiner W. Hartenstein,et al.  A decade of reconfigurable computing: a visionary retrospective , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[69]  Luigi Carro,et al.  An FPGA-based heterogeneous coarse-grained dynamically reconfigurable architecture , 2011, 2011 Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES).

[70]  Jan Gray GRVI Phalanx: A Massively Parallel RISC-V FPGA Accelerator Accelerator , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[71]  Kolin Paul,et al.  reMORPH: A Runtime Reconfigurable Architecture , 2012, 2012 15th Euromicro Conference on Digital System Design.

[72]  Guy Lemieux,et al.  VENICE: A compact vector processor for FPGA applications , 2012, 2012 International Conference on Field-Programmable Technology.

[73]  Jason Helge Anderson,et al.  Microarchitectural Comparison of the MXP and Octavo Soft-Processor FPGA Overlays , 2017, ACM Trans. Reconfigurable Technol. Syst..

[74]  J. Gregory Steffan,et al.  TILT: A multithreaded VLIW soft processor family , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[75]  Douglas L. Maskell,et al.  Efficient Overlay Architecture Based on DSP Blocks , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[76]  James Coole,et al.  Adjustable-Cost Overlays for Runtime Compilation , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[77]  Douglas L. Maskell,et al.  Adapting the DySER Architecture with DSP Blocks as an Overlay for the Xilinx Zynq , 2016, SIGARCH Comput. Archit. News.

[78]  Guy Lemieux,et al.  Soft vector processors with streaming pipelines , 2014, FPGA.