Placement Optimization for NoC-Enhanced FPGAs

Field-programmable gate array (FPGA) architectures have recently incorporated hardened networks-on-chip (NoCs) to enable more efficient and easier system-level integration. However, the embedding of hard NoCs presents a new challenge for FPGA computer-aided design (CAD); the tools need to optimize the placement of circuit netlist primitives to not only minimize total wirelength and critical path delay, but also consider the NoC traffic patterns between modules to minimize their aggregate bandwidth and/or meet latency constraints. This work enables flexible modeling of FPGA architectures with hard NoCs in the open-source versatile place & route (VPR) CAD flow, facilitating both CAD and architecture research. We enhance the placement engine in VPR to co-optimize traditional circuit implementation metrics (e.g. wirelength, critical path delay) and NoC performance metrics (e.g. congestion, bandwidth utilization, latency) when mapping an application design with NoC-attached modules to a candidate NoC-enhanced FPGA architecture. We test our VPR enhancements using a variety of synthetic benchmarks and verify that the placement engine can effectively optimize NoC aggregate bandwidth and meet specified latency constraints. Then, we present a complete flow that integrates VPR with a high-level SystemC architecture simulator, RAD-Sim, that can capture the NoC traffic flows of complete application designs and use it to drive VPR's placement optimizations. We showcase this combined flow using a real application design from the deep learning domain. The results show that our NoC-enhanced VPR flow can result in 2x reduction in NoC aggregate bandwidth (on average) compared to a NoC-agnostic flow, without affecting the design's wirelength or critical path delay.

[1]  Vaughn Betz,et al.  RLPlace: Using Reinforcement Learning and Smart Perturbations to Optimize FPGA Placement , 2022, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2]  E. Nurvitadhi,et al.  RAD-Sim: Rapid Architecture Exploration for Novel Reconfigurable Acceleration Devices , 2022, International Conference on Field-Programmable Logic and Applications.

[3]  S. J,et al.  NoC Application Mapping Optimization Using Reinforcement Learning , 2022, ACM Trans. Design Autom. Electr. Syst..

[4]  K. Kent,et al.  Yosys+Odin-II: The Odin-II Partial Mapper with Yosys Coarse-grained Netlists in VTR , 2022, FPGA.

[5]  Chaitali Chakrabarti,et al.  SIAM: Chiplet-based Scalable In-Memory Acceleration with Mesh for Deep Neural Networks , 2021, ACM Trans. Embed. Comput. Syst..

[6]  Jason Cong,et al.  HBM Connect: High-Performance HLS Interconnect for FPGA HBM , 2021, FPGA.

[7]  Martin Langhammer,et al.  SpiderWeb - High Performance FPGA NoC , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[8]  Sagheer Ahmad,et al.  Versal Network-on-Chip (NoC) , 2019, 2019 IEEE Symposium on High-Performance Interconnects (HOTI).

[9]  Sagheer Ahmad,et al.  Network-on-Chip Programmable Platform in VersalTM ACAP Architecture , 2019, FPGA.

[10]  Chirag Ravishankar,et al.  Xilinx Adaptive Compute Acceleration Platform: VersalTM Architecture , 2019, FPGA.

[11]  Tor M. Aamodt,et al.  Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling , 2018, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[12]  Vaughn Betz,et al.  Latency Insensitive Design Styles for FPGAs , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[13]  Vaughn Betz,et al.  Design and Applications for Embedded Networks-on-Chip on FPGAs , 2017, IEEE Transactions on Computers.

[14]  Vaughn Betz,et al.  LYNX: CAD for FPGA-based networks-on-chip , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[15]  Nachiket Kapre,et al.  Hoplite: Building austere overlay NoCs for FPGAs , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[16]  Vaughn Betz,et al.  Timing-Driven Titan: Enabling Large Benchmarks and Exploring the Gap between Academic and Commercial CAD , 2015, TRETS.

[17]  Martin Langhammer,et al.  Floating-Point DSP Block Architecture for FPGAs , 2015, FPGA.

[18]  Vaughn Betz,et al.  Networks-on-Chip for FPGAs: Hard, Soft or Mixed? , 2014, TRETS.

[19]  Nan Jiang,et al.  A detailed and flexible cycle-accurate Network-on-Chip simulator , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[20]  Vaughn Betz,et al.  Design tradeoffs for hard and soft FPGA-based Networks-on-Chip , 2012, 2012 International Conference on Field-Programmable Technology.

[21]  James C. Hoe,et al.  CONNECT: re-examining conventional wisdom for designing nocs in the context of FPGAs , 2012, FPGA '12.

[22]  Kenneth B. Kent,et al.  Odin II - An Open-Source Verilog HDL Synthesis Tool for CAD Research , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[23]  Radu Marculescu,et al.  Energy- and performance-aware mapping for regular NoC architectures , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[24]  Fernando Gehm Moraes,et al.  Exploring NoC mapping strategies: an energy and timing aware technique , 2005, Design, Automation and Test in Europe.

[25]  Srinivasan Murali,et al.  Bandwidth-constrained mapping of cores onto NoC architectures , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[26]  Jason Cong,et al.  Optimality and scalability study of existing placement algorithms , 2003, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[27]  M. Bohr Interconnect scaling-the real limiter to high performance ULSI , 1995, Proceedings of International Electron Devices Meeting.

[28]  E. Nurvitadhi,et al.  Architecture and Application Co-Design for Beyond-FPGA Reconfigurable Acceleration Devices , 2022, IEEE Access.

[29]  V. Betz,et al.  FPGA Architecture: Principles and Progression , 2021, IEEE Circuits and Systems Magazine.

[30]  Kevin E. Murray,et al.  VTR 8: High Performance CAD and Customizable FPGA Architecture Modelling , 2020 .

[31]  Vaughn Betz,et al.  Interconnect Solutions for Virtualized Field-Programmable Gate Arrays , 2018, IEEE Access.

[32]  Vaughn Betz,et al.  The Case for Embedded Networks on Chip on Field-Programmable Gate Arrays , 2014, IEEE Micro.

[33]  Lionel M. Ni,et al.  The turn model for adaptive routing , 1998, ISCA '98.