Snowflake: An efficient hardware accelerator for convolutional neural networks

Deep learning is becoming increasingly popular for a wide variety of applications including object detection, classification, semantic segmentation and natural language processing. Convolutional neural networks (CNNs) are a type of deep neural network that achieve high accuracy for these tasks. CNNs are hierarchical mathematical models comprising billions of operations to produce an output. The high computational complexity combined with the inherent parallelism in these models makes them an excellent target for custom accelerators. In this work we present Snowflake, a scalable, efficient low-power accelerator that is agnostic to CNN architectures. Our design is able to achieve an average computational efficiency of 91% which is significantly higher than comparable architectures. We implemented Snowflake on a Xilinx Zynq XC7Z045 APSoC. On this platform, Snowflake is capable of achieving 128 G-ops/s while consuming 9.48 W of power. Snowflake achieves a throughput and energy efficiency of 98 frames per second and 10.3 frames per joule, respectively, on AlexNet and 34 frames per second and 3.6 frames per joule on GoogLeNet.