A 65nm 39GOPS/W 24-core processor with 11Tb/s/W packet-controlled circuit-switched double-layer network-on-chip and heterogeneous execution array

With the increasing complexity and variety of applications, programmable multi-core processors are drawing attention due to their high flexibility and low implementation cost, yet their performance and energy efficiency still cannot fulfill the demands of many compute-intensive applications. This paper describes a high-performance energy-efficient 24-core processor for multi-media and communication applications, with the following key features: (1) a packet-controlled circuit-switched double-layer network-on-chip (NoC) which provides 11Tb/s/W energy efficiency with 435Gb/s bisection-bandwidth; (2) a cluster-shared NoC-connected heterogeneous reconfigurable execution array, which can improve the performance of frequently used computations in multimedia and communication applications by over 6×; (3) memory hierarchy improvements, including a multi-page foreground and background register file, and memory splitting and sharing. The processor, implemented in TSMC 65nm CMOS LP and occupying 18.8mm2 (Fig. 3.6.7) operates at 850MHz at 1.2V, with 523mW power dissipation and 39GOPS/W (26pJ/operation) energy efficiency, which is 1.75× better than our former 16-core processor [3].

[1]  Sanu Mathew,et al.  A 4.1Tb/s bisection-bandwidth 560Gb/s/W streaming circuit-switched 8×8 mesh network-on-chip in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[2]  Timothy Mattson,et al.  A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[3]  Tatsuya Mori,et al.  A Power, Performance Scalable Eight-Cores Media Processor for Mobile Multimedia Applications , 2009, IEEE Journal of Solid-State Circuits.

[4]  Zhiyi Yu,et al.  An 800MHz 320mW 16-core processor with message-passing and shared-memory inter-core communication mechanisms , 2012, 2012 IEEE International Solid-State Circuits Conference.

[5]  B. Reese,et al.  Real-time H.24-AVC codec on Intel architectures , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..