A 64-PE folded-torus intra-chip communication fabric for guaranteed throughput in Network-on-Chip based applications

This paper presents the design of a 64-PE folded-torus intra-chip communication fabric used to provide guaranteed throughput in terms of dead- and live-lock free and in-order data delivery, which is suitable for NoC-based real-time processing applications. A test chip using the proposed intra-chip communication fabric designed to integrate 64 RISC-based processing elements is fabricated in 1P6M 0.13µm CMOS technology with 23mm2 die area. At room temperature, the measured peak power (all PE-tiles activated) of the test chip is 200mW @ 128MHz at 1.2Vcc. The intra-chip network consuming 9. 4% the chip area and 18% of the total chip power can provide a maximum bisection bandwidth of 44.6Gb/s with an approximate energy per transported bit of 0.14 pJ/bit/hop.

[1]  Joo-Young Kim,et al.  A 125 GOPS 583 mW Network-on-Chip Based Parallel Processor With Bio-Inspired Visual Attention Engine , 2009, IEEE Journal of Solid-State Circuits.

[2]  Donghyun Kim,et al.  A 125GOPS 583mW Network-on-Chip Based Parallel Processor with Bio-inspired Visual-Attention Engine , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[3]  Sudhakar Yalamanchili,et al.  A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks , 1995, IEEE Trans. Parallel Distributed Syst..

[4]  Sudhakar Yalamanchili,et al.  Interconnection Networks , 2011, Encyclopedia of Parallel Computing.

[5]  Saurabh Dighe,et al.  An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[6]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[7]  Shorin Kyo,et al.  An Integrated Memory Array Processor for Embedded Image Recognition Systems , 2007, IEEE Transactions on Computers.

[8]  A. Alvandpour,et al.  A 5.1GHz 0.34mm2 Router for Network-on-Chip Applications , 2007, 2007 IEEE Symposium on VLSI Circuits.

[9]  T. Mohsenin,et al.  A 167-processor 65 nm computational platform with per-processor dynamic supply voltage and dynamic clock frequency scaling , 2008, 2008 IEEE Symposium on VLSI Circuits.

[10]  Hyunseok Lee,et al.  SODA: A Low-power Architecture For Software Radio , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).