An efficient implementation of 2D convolution in CNN

Convolutional neural network(CNN), a well-known machine learning algorithm, has been widely used in the field of computer vision for its amazing performance in image classification. With the rapid growth of applications based on CNN, various acceleration schemes have been proposed on FPGA, GPU and ASIC. In the implementation of these specific hardware accelerations, the most challenging part is the implementation of 2D convolution. To obtain a more efficient design of 2D convolution in CNN, this paper proposes a novel technique, singular value decomposition approximation (SVDA) to reduce the usage of resources. Experimental results show that the proposed SVDA hardware implementation can achieve a reduction in resources in the range of 14.46% to 37.8%, while the loss of classification accuracy is less than 1%.

[1]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Yann LeCun,et al.  CNP: An FPGA-based processor for Convolutional Networks , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[4]  Srihari Cadambi,et al.  A Massively Parallel Coprocessor for Convolutional Neural Networks , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[5]  Christophe Garcia,et al.  Convolutional face finder: a neural architecture for fast and robust face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Jason Cong,et al.  Minimizing Computation in Convolutional Neural Networks , 2014, ICANN.

[8]  Tsutomu Sasao,et al.  A deep convolutional neural network based on nested residue number system , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[9]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[10]  Srihari Cadambi,et al.  A dynamically configurable coprocessor for convolutional neural networks , 2010, ISCA.

[11]  Ning Li,et al.  A multistage dataflow implementation of a Deep Convolutional Neural Network based on FPGA for high-speed object recognition , 2016, 2016 IEEE Southwest Symposium on Image Analysis and Interpretation (SSIAI).

[12]  R. Shoup Parameterized Convolution Filtering in a Field Programmable Gate Array , 1993 .