CNN-DMA: A Predictable and Scalable Direct Memory Access Engine for Convolutional Neural Network with Sliding-window Filtering

Memory bandwidth utilization has become the key performance bottleneck for state-of-the-art variants of neural network kernels. Current structures such as depth-wise, point-wise and atrous convolutions have already introduced diverse and discontinuous memory access patterns, which impact efficient activation supply due to more frequent cache misses and consequently high-penalty DRAM pre-charging. To handle this, GPU achieves efficient parallelization with sophisticated optimization of CUDA program to reduce memory footprints, which demands high engineering efforts. In this work, we in contrast propose a programmable direct memory access engine for convolutional neural networks (CNN-DMA) supporting a fast supply of activation for independent and scalable computing units. The CNN-DMA favours a predictable activation streaming approach which completely avoids penalties by bus contention, cache misses and less carefully designed low-level programs. Furthermore, we enhance the baseline DMA with the capability of out-of-order data supply to filter out unique sliding-windows to boost the performance of the computing infrastructure. Experiments on state-of-the-art neural networks show that CNN-DMA achieves optimal DRAM access efficiency for point-wise convolution layers, while reduces 30% to 70% rounds of computation with sliding-window filtering.

[1]  Vivienne Sze,et al.  Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks , 2017, IEEE Journal of Solid-State Circuits.

[2]  Nuno Roma,et al.  Stream data prefetcher for the GPU memory interface , 2018, The Journal of Supercomputing.

[3]  Jose-Maria Arnau,et al.  Computation Reuse in DNNs by Exploiting Input Similarity , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[4]  Arash AziziMazreah,et al.  Shortcut Mining: Exploiting Cross-Layer Shortcut Reuse in DCNN Accelerators , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[5]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.