TASO: Time and Space Optimization for Memory-Constrained DNN Inference

Convolutional neural networks (CNNs) are used in many embedded applications, from industrial robotics and automation systems to biometric identification on mobile devices. State-of-the-art classification is typically achieved by large networks, which are prohibitively expensive to run on mobile and embedded devices with tightly constrained memory and energy budgets. We propose an approach for ahead-of-time domain specific optimization of CNN models, based on an integer linear programming (ILP) for selecting primitive operations to implement convolutional layers. We optimize the trade-off between execution time and memory consumption by: 1) attempting to minimize execution time across the whole network by selecting data layouts and primitive operations to implement each layer; and 2) allocating an appropriate work space that reflects the upper bound of memory footprint per layer. These two optimization strategies can be used to run any CNN on any platform with a C compiler. Our evaluation with a range of popular ImageNet neural architectures (GoogleNet, AlexNet, VGG, ResNetand SqueezeNet) on the ARM Cortex-A15 yields speedups of 8× compared to a greedy algorithm based primitive selection, reduces memory requirement by 2.2× while sacrificing only 15% of inference time compared to a solver that considers inference time only. In addition, our optimization approach exposes a range of optimal points for different configurations across the Pareto frontier of memory and latency trade-off, which can be used under arbitrary system constraints.

[1]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[2]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[3]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[4]  Hai Liu,et al.  Latte: a language, compiler, and runtime for elegant and efficient deep neural networks , 2016, PLDI.

[5]  Daniel Brand,et al.  MEC: Memory-efficient Convolution for Deep Neural Network , 2017, ICML.

[6]  Steve Hodges,et al.  SenseCam: A wearable camera that stimulates and rehabilitates autobiographical memory , 2011, Memory.

[7]  Andrew Lavin,et al.  Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ying Zhang,et al.  Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks , 2016, INTERSPEECH.

[9]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[10]  Matthew W. Moskewicz,et al.  Boda: A Holistic Approach for Implementing Neural Network Computations , 2017, Conf. Computing Frontiers.

[11]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[12]  David Gregg,et al.  Optimal DNN primitive selection with partitioned boolean quadratic programming , 2018, CGO.

[13]  Athanasios V. Vasilakos,et al.  Fog Computing for Sustainable Smart Cities: A Survey , 2017, ACM Comput. Surv..

[14]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[15]  Yangqing Jia,et al.  Learning Semantic Image Representations at a Large Scale , 2014 .

[16]  Laura Giarré,et al.  Enabling independent navigation for visually impaired people through a wearable vision-based feedback system , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[17]  David Gregg,et al.  Low-memory GEMM-based convolution algorithms for deep neural networks , 2017, ArXiv.

[18]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[19]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[20]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[21]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.