A 300 mV 494GOPS/W Reconfigurable Dual-Supply 4-Way SIMD Vector Processing Accelerator in 45 nm CMOS

This paper describes a reconfigurable 4-way SIMD engine fabricated in 45 nm high-k/metal-gate CMOS, targeted for on-die acceleration of vector processing in power-constrained mobile microprocessors. The SIMD accelerator is reconfigured to perform 4-way 16b × 16b multiplies, 32b × 32b multiply, 4-way 16b additions, 2-way 32b additions or 72b addition with single-cycle throughput and wide supply voltage range of operation (1.3 V-230 mV). A reconfigurable 2 × 2 tile of signed 2's complement 16b multipliers, with conditional carry gating in the 72b sparse tree adder, dual-supplies for voltage hopping, and fine-grained power-gating enables peak energy efficiency of 494GOPS/W (measured at 300 mV, 50°C) with a dense layout occupying 0.081 mm2 while achieving: (i) scalable performance up to 2.8 GHz, 278 mW measured at 1.3 V; (ii) fast single-cycle switching between any operating/idle mode; (iii) configuration-dependent power reduction of up to 41% in total power and 6.5× in active leakage power; (iv) 10× standby leakage reduction during idle mode; (v) deep subthreshold operation measured at 230 mV, 8.8 MHz, 87 ¿W; and (vi) compensation for up to 3× performance variation in ultra-low voltage mode.

[1]  H. Zhang,et al.  A 1-V heterogeneous reconfigurable DSP IC for wireless baseband digital signal processing , 2000, IEEE Journal of Solid-State Circuits.

[2]  Paul Wielage,et al.  XETAL-II: A 107 GOPS, 600mW Massively-Parallel Processor for Video Scene Analysis , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[3]  R. Chau,et al.  A 45nm Logic Technology with High-k+Metal Gate Transistors, Strained Silicon, 9 Cu Interconnect Layers, 193nm Dry Patterning, and 100% Pb-free Packaging , 2007, 2007 IEEE International Electron Devices Meeting.

[4]  T. Gyohten,et al.  The Design and Implementation of the Massively Parallel Processor Based on the Matrix Architecture , 2007, IEEE Journal of Solid-State Circuits.

[5]  R.P. Kleihorst,et al.  Xetal-II: A 107 GOPS, 600 mW Massively Parallel Processor for Video Scene Analysis , 2008, IEEE Journal of Solid-State Circuits.

[6]  Bo Zhai,et al.  Exploring Variability and Performance in a Sub-200-mV Processor , 2008, IEEE Journal of Solid-State Circuits.

[7]  Kei Ito,et al.  A 512GOPS Fully-Programmable Digital Image Processor with full HD 1080p Processing Capabilities , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[8]  An 11 mm$^{2}$ , 70 mW Fully Programmable Baseband Processor for Mobile WiMAX and DVB-T/H in 0.12$\ \mu$ m CMOS , 2009, IEEE Journal of Solid-State Circuits.

[9]  M. K. Gowan,et al.  A 65 nm 2-Billion Transistor Quad-Core Itanium Processor , 2009, IEEE Journal of Solid-State Circuits.

[10]  S. Borkar,et al.  A 320 mV 56 μW 411 GOPS/Watt Ultra-Low Voltage Motion Estimation Accelerator in 65 nm CMOS , 2009, IEEE Journal of Solid-State Circuits.

[11]  A.P. Chandrakasan,et al.  A 65 nm Sub-$V_{t}$ Microcontroller With Integrated SRAM and Switched Capacitor DC-DC Converter , 2008, IEEE Journal of Solid-State Circuits.

[12]  S. Yoshioka,et al.  A 65 nm Single-Chip Application and Dual-Mode Baseband Processor With Partial Clock Activation and IP-MMU , 2009, IEEE Journal of Solid-State Circuits.