ASIR: Application-Specific Instruction-Set Router for NoC-Based MPSoCs

The end of Dennard scaling led to the use of heterogeneous multi-processor systems-on-chip (MPSoCs). Heterogeneous MPSoCs provide a high efficiency in terms of energy and performance due to the fact that each processing element can be optimized for an application task. However, the evolution of MPSoCs shows a growing number of processing elements (PEs), which leads to tremendous communication costs, tending to become the performance bottleneck. Networks-on-chip (NoCs) are a promising and scalable intra-chip communication technology for MPSoCs. However, these technological advances require novel and effective programming methodologies to efficiently exploit them. This work presents a novel router architecture called application-specific instruction-set router (ASIR) for field-programmable-gate-arrays (FPGA)-based MPSoCs. It combines data transfers with application-specific processing by adding high-level synthesized processing units to routers of the NoC. The execution of application-specific operations during data exchange between PEs exploits efficiently the transmission time. Furthermore, the processing units can be programmed in C/C++ using high-level synthesis, and accordingly, they can be specifically optimized for an application. This approach enables transferred data to be processed by a processing element, such as a MicroBlaze processor, before the transmission or by a router during the transmission. Moreover, a static mapping algorithm for applications modeled by a Kahn process network-based graph is introduced that maps tasks to the MicroBlaze processors and processing units. The mapping algorithm optimizes the communication cost by allocating tasks to nearest neighboring PEs. This complete methodology significantly simplifies the design and programming of ASIR-based MPSoCs. Furthermore, it efficiently exploits the heterogeneity of processing capabilities inside the routers and MicroBlaze processors.

[1]  Jing Li,et al.  Reconfigurable in-memory computing with resistive memory crossbar , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[2]  Keith D. Underwood,et al.  A reconfigurable extension to the network interface of beowulf clusters , 2001, Proceedings 42nd IEEE Symposium on Foundations of Computer Science.

[3]  Niyati Gupta,et al.  Network-on-chip: Current issues and challenges , 2015, 2015 19th International Symposium on VLSI Design and Test.

[4]  Haibo Huang,et al.  Nighttime lane markings recognition based on Canny detection and Hough transform , 2016, 2016 IEEE International Conference on Real-time Computing and Robotics (RCAR).

[5]  Akash Kumar,et al.  XNoC: A non-intrusive TDM circuit-switched Network-on-Chip , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[6]  Emmanuel Jeannot,et al.  DKPN: A Composite Dataflow/Kahn Process Networks Execution Model , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[7]  Qiang Liu,et al.  Pipelined NoC router architecture design with buffer configuration exploration on FPGA , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[8]  David Novo,et al.  Design Space Exploration of LDPC Decoders Using High-Level Synthesis , 2017, IEEE Access.

[9]  Amit Kumar Singh,et al.  Mapping on multi/many-core systems: Survey of current and emerging trends , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[10]  Nachiket Kapre,et al.  Hoplite: Building austere overlay NoCs for FPGAs , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[11]  Denis Navarro,et al.  Optimized FPGA Implementation of Model Predictive Control for Embedded Systems Using High-Level Synthesis Tool , 2018, IEEE Transactions on Industrial Informatics.

[12]  Mohammed M. Farag,et al.  Overloaded CDMA Crossbar for Network-On-Chip , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  Yu Ting Chen,et al.  A Survey and Evaluation of FPGA High-Level Synthesis Tools , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[14]  Anupam Chattopadhyay,et al.  ReVAMP: ReRAM based VLIW architecture for in-memory computing , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[15]  Jari Nurmi,et al.  HW/SW Co-design of an IEEE 802.11a/g Receiver on Xilinx Zynq SoC using High-Level Synthesis , 2017, HEART.

[16]  Diana Göhringer,et al.  Application-specific processing using high-level synthesis for networks-on-chip , 2017, 2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[17]  Jürgen Teich,et al.  Automatic Optimization of Hardware Accelerators for Image Processing , 2015, ArXiv.

[18]  Ivan Saraiva Silva,et al.  IPNoSys II — A new architecture for IPNoSys programming model , 2015, 2015 28th Symposium on Integrated Circuits and Systems Design (SBCCI).

[19]  Donata D. Acula,et al.  HeMatic: An automated leukemia detector with separation of overlapping blood cells through Image Processing and Genetic Algorithm , 2017, 2017 International Conference on Applied System Innovation (ICASI).

[20]  William J. Dally,et al.  Flit-reservation flow control , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[21]  Timo Hämäläinen,et al.  Parameterizing simulated annealing for distributing Kahn Process Networks on multiprocessor SoCs , 2009, 2009 International Symposium on System-on-Chip.

[22]  Andrés Goens,et al.  Implicit Data-Parallelism in Kahn Process Networks: Bridging the MacQueen Gap , 2018, PARMA-DITAM '18.

[23]  Miaoqing Huang,et al.  OOGen: An Automated Generation Tool for Custom MPSoC Architectures Based on Object-Oriented Programming Methods , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).