Dependency-Driven Trace-Based Network-on-Chip Emulation on FPGAs

FPGA emulation is a promising approach to accelerating Network-on-Chip (NoC) modeling which has traditionally relied on software simulators. In most early studies of FPGA-based NoC emulators, only synthetic workloads like uniform and bit permutations were considered. Although a set of carefully designed synthetic workloads can reveal a relatively thorough coverage of the characteristics of the NoC under evaluation, they alone are insufficient, especially when the NoC needs to be optimized for specific applications. In such cases, trace-driven workloads are effective. However, there is a problem with conventional trace-driven workloads that has been pointed out by some recent studies: the network load and congestion may be distorted because dependencies between packets are not considered. These studies also provide infrastructures for extending existing software simulators to enforce dependencies between packets. Unfortunately, enforcing dependencies between packets is not trivial in the FPGA emulation approach. Therefore, although there are some recent FPGA-based NoC emulators supporting trace-driven workloads, most of them ignore packet dependencies. In this paper, we first clarify the challenges of supporting trace-driven workloads with dependencies between packets taken into account in the FPGA emulation approach. We then propose efficient methods and architectures to tackle these challenges and build an FPGA-based NoC emulator, which we call DNoC, based on the proposals. Our evaluation results show that (1) on a VC707 FPGA board, DNoC achieves an average speed of 10,753K cycles/s when emulating an 8x8 NoC with trace data collected from full-system simulation of the PARSEC benchmark suite, which is 274x higher than the speed reported in a recent related work on dependency-driven trace-based NoC emulation on FPGAs; (2) Compared to BookSim, one of the most popular NoC simulators, DNoC is 395x faster while providing the same results; (3) DNoC can scale to a 4,096-node NoC on a VC707 board, and the size of the largest NoC depends on only the on-chip memory capacity of the target FPGA.

[1]  Ge-Ming Chiu,et al.  The Odd-Even Turn Model for Adaptive Routing , 2000, IEEE Trans. Parallel Distributed Syst..

[2]  Kenji Kise,et al.  Ultra-fast NoC emulation on a single FPGA , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[3]  Yen-Chen Liu,et al.  Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.

[4]  Eduardo de la Torre,et al.  A Fast Emulation-Based NoC Prototyping Framework , 2008, 2008 International Conference on Reconfigurable Computing and FPGAs.

[5]  David Wentzlaff,et al.  Piton: A Manycore Processor for Multitenant Clouds , 2017, IEEE Micro.

[6]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[7]  Venkatesh Akella,et al.  Inferring packet dependencies to improve trace based simulation of on-chip networks , 2011, Proceedings of the Fifth ACM/IEEE International Symposium.

[8]  Nan Jiang,et al.  A detailed and flexible cycle-accurate Network-on-Chip simulator , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[9]  Ulrich Rückert,et al.  CoreVA-MPSoC: A Many-Core Architecture with Tightly Coupled Shared and Local Data Memories , 2018, IEEE Transactions on Parallel and Distributed Systems.

[10]  Kenji Kise,et al.  An Effective Architecture for Trace-Driven Emulation of Networks-on-Chip on FPGAs , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[11]  Michael Papamichael,et al.  Fast scalable FPGA-based Network-on-Chip simulation models , 2011, Ninth ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMPCODE2011).

[12]  Nick McKeown,et al.  The iSLIP scheduling algorithm for input-queued switches , 1999, TNET.

[13]  Ryan Kastner,et al.  RIFFA 2.0: A reusable integration framework for FPGA accelerators , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[14]  Benoît Dupont de Dinechin,et al.  A clustered manycore processor architecture for embedded and accelerated applications , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[15]  Changqing Xu,et al.  SRNoC: An Ultra-Fast Configurable FPGA-Based NoC Simulator Using Switch–Router Architecture , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[16]  Philip Heng Wai Leong,et al.  Technology Scaling in FPGAs: Trends in Applications and Architectures , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[17]  Shaahin Hessabi,et al.  DuCNoC: A High-Throughput FPGA-Based NoC Simulator Using Dual-Clock Lightweight Router Micro-Architecture , 2018, IEEE Transactions on Computers.

[18]  Nachiket Kapre,et al.  Hoplite , 2017, ACM Trans. Reconfigurable Technol. Syst..

[19]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[20]  Gerard J. M. Smit,et al.  Fast, Accurate and Detailed NoC Simulations , 2007, First International Symposium on Networks-on-Chip (NOCS'07).

[21]  Natalie D. Enright Jerger,et al.  DART: A Programmable Architecture for NoC Simulation on FPGAs , 2014, IEEE Transactions on Computers.

[22]  Stephen W. Keckler,et al.  Netrace: dependency-driven trace-based network-on-chip simulation , 2010, NoCArc '10.

[23]  Paul Gratz,et al.  AcENoCs: A Configurable HW/SW Platform for FPGA Accelerated NoC Emulation , 2011, 2011 24th Internatioal Conference on VLSI Design.

[24]  Shaahin Hessabi,et al.  AdapNoC: A fast and flexible FPGA-based NoC simulator , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[25]  Kenneth A. Ross,et al.  Q100: the architecture and design of a database processing unit , 2014, ASPLOS.

[26]  Nanning Zheng,et al.  HORNET: A Cycle-Level Multicore Simulator , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[27]  Weichen Liu,et al.  A Systematic and Realistic Network-on-Chip Traffic Modeling and Generation Technique for Emerging Many-Core Systems , 2018, IEEE Transactions on Multi-Scale Computing Systems.

[28]  Tobias Drewes,et al.  An FPGA-based prototyping framework for Networks-on-Chip , 2017, 2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[29]  Kenji Kise,et al.  Fast and Cycle-Accurate Emulation of Large-Scale Networks-on-Chip Using a Single FPGA , 2017, TRETS.

[30]  Salvatore Monteleone,et al.  Cycle-Accurate Network on Chip Simulation with Noxim , 2016, ACM Trans. Model. Comput. Simul..

[31]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .