Modified distributed arithmetic based low complexity CNN architecture design methodology

CNN involves large number of convolution of feature maps and kernels, necessary for extracting useful features for accurate classification. However, it requires significant amount of computationally intensive power and area hungry multiplications limiting its deployment on embedded devices under resource constrained scenario. To address this problem, we propose modified distributed arithmetic based low complexity multiplier-less CNN architecture design methodology. To the best of our knowledge, this is the first work of its kind where DA is modified and applied on CNN for realizing low complexity architecture implementation resulting power and area efficient design. Subsequently, the architecture is implemented and prototyped using Xilinx FPGA (Spartan 6) and synthesized on ASIC platform using Synopsys Design Compiler with TSMC 130nm technology library. The synthesis results and comparative analysis proved that the proposed modified DA based methodology implies less computations, complexity and up to 1.49x and 36.70x less area, 1.17x and 121.13x less power saving as compared to direct multiplication and multiplication with AND-OR based conventional methodologies respectively.

[1]  Berin Martini,et al.  An efficient implementation of deep convolutional neural networks on a mobile coprocessor , 2014, 2014 IEEE 57th International Midwest Symposium on Circuits and Systems (MWSCAS).

[2]  Michael Ferdman,et al.  Maximizing CNN accelerator efficiency through resource partitioning , 2016, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[3]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[4]  Daisuke Miyashita,et al.  Convolutional Neural Networks using Logarithmic Data Representation , 2016, ArXiv.

[5]  Yixin Chen,et al.  Compressing Convolutional Neural Networks in the Frequency Domain , 2015, KDD.

[6]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[7]  Chi-Ying Tsui,et al.  BHNN: A memory-efficient accelerator for compressing deep neural networks with blocked hashing techniques , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[8]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[9]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Song Han,et al.  A Deep Neural Network Compression Pipeline: Pruning, Quantization, Huffman Encoding , 2015 .

[12]  Ming Zhang,et al.  Two-Bit Networks for Deep Learning on Resource-Constrained Embedded Devices , 2017, ArXiv.

[13]  Benjamin Schrauwen,et al.  Deep content-based music recommendation , 2013, NIPS.

[14]  Vikas Chandra,et al.  Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations , 2017, ArXiv.

[15]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.