Fast Convolution Algorithm for Convolutional Neural Networks

Recent advances in computing power made possible by developments of faster general-purpose graphics processing units (GPGPUs) have increased the complexity of convolutional neural network (CNN) models. However, because of the limited applications of the existing GPGPUs, CNN accelerators are becoming more important. The current accelerators focus on improvement in memory scheduling and architectures. Thus, the number of multiplier-accumulator (MAC) operations is not reduced. In this study, a new convolution layer operation algorithm is proposed using the coarse-to-fine method instead of hardware or architecture approaches. This algorithm is shown to reduce the MAC operations by 33%. However, the accuracy of the Top 1 is decreased only by 3% and the Top 5 only by 1%.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yu Cao,et al.  Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Xiaogang Wang,et al.  Convolutional neural networks with low-rank regularization , 2015, ICLR.

[7]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[8]  Warren J. Gross,et al.  An Architecture to Accelerate Convolution in Deep Neural Networks , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.