Design-Technology Space Exploration for Energy Efficient AiMC-Based Inference Acceleration

Extremely energy-efficient convolutional neural network inference (CNN) is recently enabled by analog in-memory compute (AiMC). The integration of AiMC in a primarily digital inference system brings new challenges ranging from device specifications to defining novel system architectures. A novel framework to evaluate the impact of AiMC array at the system level is presented. The framework is used to model a SRAM-based 1024 × 512 prototype AiMC array, capable of energy efficiency of upto 675 TMACs/W. The proposed framework allows modelling different compute cells, array dimensions, operating voltage, and activation buffer energy models which can be used to determine the overall energy efficiency for various CNN workloads.