Deep Neural Networks Characterization Framework for Efficient Implementation on Embedded Systems

Bio-inspired machine learning algorithms, such as Convolutional Neural Networks (CNNs), offer interesting solutions to complex real-life problems that cannot be simply modeled. Applications involving image recognition and object detection can greatly benefit from these approaches. Furthermore, their intrinsic and regular parallel structure offer opportunities regarding hardware acceleration. However, moving compute and memory-intensive CNNs to embedded systems while maintaining high energy-efficiency remains challenging. This paper presents the first step of a generic framework targeting the characterization of neural network algorithms to improve their implementation on embedded systems. The presented approach aims at reducing the gap between the fast-changing landscape of applications based on artificial intelligence and the hardware targets. The framework computes different metrics from neural network descriptions (such as computation and memory needs or data locality and reuse) to derive appropriate implementation strategies, or configurations of target architectures. Based on the outputs of the framework, new neural networks topologies can be quickly studied to reduce time-to-market of new systems.

[1]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[3]  Kurt Keutzer,et al.  Invited: Co-Design of Deep Neural Nets and Neural Net Accelerators for Embedded Vision Applications , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[4]  Cristina Silvano,et al.  Design Space Exploration for Orlando Ultra Low-Power Convolutional Neural Network SoC , 2018, 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[5]  Nitin Chawla,et al.  14.1 A 2.9TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[6]  Alessandro Aimar,et al.  NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[8]  Kurt Keutzer,et al.  Invited: Co-Design of Deep Neural Nets and Neural Net Accelerators for Embedded Vision Applications , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[9]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[10]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[11]  Nicolas Ventroux,et al.  PNeuro: A scalable energy-efficient programmable hardware accelerator for neural networks , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[12]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[14]  Marco D. Santambrogio,et al.  Hardware Design Automation of Convolutional Neural Networks , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[15]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).