A 5.5GHz 0.84TOPS/mm2 neural network engine with stream architecture and resonant clock mesh

This paper presents an ultra-high-performance neural network engine fabricated in a 65nm CMOS technology. The 0.9mm2 core relies on an energy-efficient resonant clock mesh running at 5.5GHz to achieve 0.76 8-bit TOPS, improving throughput by over 4x, area efficiency by over 8×, and energy-delay-area product by over 1.8× compared to previous state-of-the-art neural network designs. Achieving a charge recovery rate of 63%, the resonant clock mesh enables the deployment of a deeply-pipelined stream architecture and high-speed stream buffers with a sub-5W power consumption.