A 167-Processor Computational Platform in 65 nm CMOS

A 167-processor computational platform consists of an array of simple programmable processors capable of per-processor dynamic supply voltage and clock frequency scaling, three algorithm-specific processors, and three 16 KB shared memories; and is implemented in 65 nm CMOS. All processors and shared memories are clocked by local fully independent, dynamically haltable, digitally-programmable oscillators and are interconnected by a configurable circuit-switched network which supports long-distance communication. Programmable processors occupy 0.17&nbsp;mm<sup>2</sup> and operate at a maximum clock frequency of 1.2 GHz at 1.3 V. At 1.2 V, they operate at 1.07 GHz and consume 47.5&nbsp;mW when 100% active, resulting in an energy dissipation of 44 pJ per operation. At 0.675 V, they operate at 66 MHz and consume 608&nbsp;muW when 100% active, resulting in a total energy dissipation of 9.2 pJ per ALU or MAC operation.

[1]  Daniel Marcos Chapiro,et al.  Globally-asynchronous locally-synchronous systems , 1985 .

[2]  J. Kim,et al.  An efficient digital sliding controller for adaptive power supply regulation , 2001, 2001 Symposium on VLSI Circuits. Digest of Technical Papers (IEEE Cat. No.01CH37185).

[3]  Coniferous softwood GENERAL TERMS , 2003 .

[4]  Tinoosh Mohsenin,et al.  A 167-processor computational array for highly-efficient DSP and embedded application processing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[5]  V. Strumpen,et al.  A 16-issue multiple-program-counter microprocessor with point-to-point scalar operand network , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[6]  Zhiyi Yu,et al.  A Low-Area Multi-Link Interconnect Architecture for GALS Chip Multiprocessors , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[7]  Fabien Clermidy,et al.  Dynamic Voltage and Frequency Scaling Architecture for Units Integration within a GALS NoC , 2008, Second ACM/IEEE International Symposium on Networks-on-Chip (nocs 2008).

[8]  M.J. Meeuwsen,et al.  A full-rate software implementation of an IEEE 802.11a compliant digital baseband transmitter , 2004, IEEE Workshop onSignal Processing Systems, 2004. SIPS 2004..

[9]  Jan M. Rabaey,et al.  Digital Integrated Circuits: A Design Perspective , 1995 .

[10]  D.N. Truong,et al.  A complete real-time 802.11a baseband receiver implemented on an array of programmable processors , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[11]  Zhiyi Yu,et al.  A low-area interconnect architecture for chip multiprocessors , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[12]  Ryan W. Apperson,et al.  A Scalable Dual-Clock FIFO for Data Transfers Between Arbitrary and Haltable Clock Domains , 2007, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  T. Mohsenin,et al.  A 167-processor 65 nm computational platform with per-processor dynamic supply voltage and dynamic clock frequency scaling , 2008, 2008 IEEE Symposium on VLSI Circuits.

[14]  T. Mohsenin,et al.  An asynchronous array of simple processors for dsp applications , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[15]  Meeta Sharma Gupta,et al.  System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[16]  A.P. Chandrakasan,et al.  Ultra-dynamic Voltage scaling (UDVS) using sub-threshold operation and local Voltage dithering , 2006, IEEE Journal of Solid-State Circuits.

[17]  Shekhar Borkar Thousand Core ChipsA Technology Perspective , 2007, DAC 2007.

[18]  Zhiyi Yu,et al.  High Performance, Energy Efficiency, and Scalability With GALS Chip Multiprocessors , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[19]  Saurabh Dighe,et al.  An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[20]  R. Harjani,et al.  A High Efficiency DC-DC Converter Using 2nH On-Chip Inductors , 2007, 2007 IEEE Symposium on VLSI Circuits.

[21]  Ryan W. Apperson,et al.  AsAP: An Asynchronous Array of Simple Processors , 2008, IEEE Journal of Solid-State Circuits.

[22]  Zhiyi Yu,et al.  A Shared Memory Module for Asynchronous Arrays of Processors , 2007, EURASIP J. Embed. Syst..

[23]  Bevan M. Baas,et al.  Dynamic voltage and frequency scaling circuits with two supply voltages , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[24]  Kevin J. Nowka,et al.  Dynamic Power Management by Combination of Dual Static Supply Voltages , 2007, 8th International Symposium on Quality Electronic Design (ISQED'07).

[25]  S. Narendra,et al.  A 233-MHz 80%-87% efficient four-phase DC-DC converter utilizing air-core inductors on package , 2005, IEEE Journal of Solid-State Circuits.

[26]  Bevan M. Baas,et al.  A high-performance parallel CAVLC encoder on a fine-grained many-core system , 2008, 2008 IEEE International Conference on Computer Design.