STONNE: Enabling Cycle-Level Microarchitectural Simulation for DNN Inference Accelerators

The design of specialized architectures for accelerating the inference procedure of Deep Neural Networks (DNNs) is a booming area of research nowadays. While first-generation rigid accelerator proposals used simple fixed dataflows tailored for dense DNNs, more recent architectures have argued for flexibility to efficiently support a wide variety of layer types, dimensions, and sparsity. As the complexity of these accelerators grows, the analytical models currently being used prove unable to capture execution-time subtleties, thus resulting inexact in many cases. We present STONNE (<italic><underline>S</underline>imulation <underline>TO</underline>ol of <underline>N</underline>eural <underline>N</underline>etwork <underline>E</underline>ngines</italic>), a cycle-level microarchitectural simulator for state-of-the-art rigid and flexible DNN inference accelerators that can plug into any high-level DNN framework as an accelerator device, and perform full-model evaluation of both dense and sparse real, unmodified DNN models.