DiReCtX: Dynamic Resource-Aware CNN Reconfiguration Framework for Real-Time Mobile Applications

Although convolutional neural networks (CNNs) have been widely applied in various cognitive applications, they are still very computationally intensive for resource-constrained mobile systems. To reduce the resource consumption of CNN computation, many optimization works have been proposed for mobile CNN deployment. However, most works are merely targeting CNN model compression from the perspective of parameter size or model structure, ignoring different resource constraints in mobile systems with respect to memory, energy, and real-time requirement. Moreover, previous works take accuracy as their primary consideration, requiring a time-costing retraining process to compensate the inference accuracy loss after compression. To address these issues, we propose DiReCtX—a dynamic resource-aware CNN model reconfiguration framework. DiReCtX is based on a set of accurate CNN profiling models for different resource consumption and inference accuracy estimation. With manageable consumption/accuracy tradeoffs, DiReCtX can reconfigure a CNN model to meet distinct resource constraint types and levels with expected inference performance maintained. To further achieve fast model reconfiguration in real-time, improved CNN model pruning and its corresponding accuracy tuning strategies are also proposed in DiReCtX. The experiments show that the proposed CNN profiling models can achieve 94.6% and 97.1% accuracy for CNN model resource consumption and inference accuracy estimation. Meanwhile, the proposed reconfiguration scheme of DiReCtX can achieve at most 44.44% computation acceleration, 31.69% memory reduction, and 32.39% energy saving, respectively. On field-tests with state-of-the-art smartphones, DiReCtX can adapt CNN models to various resource constraints in mobile application scenarios with optimal real-time performance.

[1]  Minyong Kim,et al.  Enhancing online power estimation accuracy for smartphones , 2012, IEEE Transactions on Consumer Electronics.

[2]  Yingyan Lin,et al.  EnergyNet: Energy-Efficient Dynamic Inference , 2018 .

[3]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[4]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[5]  Samuel Williams,et al.  Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures , 2008 .

[6]  Elad Eban,et al.  MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[9]  Andreas Stolcke,et al.  The Microsoft 2017 Conversational Speech Recognition System , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[11]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[12]  Chenchen Liu,et al.  DiReCt: Resource-Aware Dynamic Model Reconfiguration for Convolutional Neural Network in Mobile Systems , 2018, ISLPED.

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Bo Chen,et al.  NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications , 2018, ECCV.

[15]  Jason Cong,et al.  Minimizing Computation in Convolutional Neural Networks , 2014, ICANN.

[16]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[17]  Paramvir Bahl,et al.  Fine-grained power modeling for smartphones using system call tracing , 2011, EuroSys '11.

[18]  Steven Swanson,et al.  Greendroid: Exploring the next evolution in smartphone application processors , 2011, IEEE Communications Magazine.

[19]  Luc Van Gool,et al.  AI Benchmark: All About Deep Learning on Smartphones in 2019 , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[20]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Chenchen Liu,et al.  ReForm: Static and Dynamic Resource-Aware DNN Reconfiguration Framework for Mobile Device , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Lin Zhong,et al.  Self-constructive high-rate system energy modeling for battery-powered mobile systems , 2011, MobiSys '11.

[24]  Jason Cong,et al.  Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[25]  Jianxin Wu,et al.  An Entropy-based Pruning Method for CNN Compression , 2017, ArXiv.

[26]  Ming Zhang,et al.  Two-Bit Networks for Deep Learning on Resource-Constrained Embedded Devices , 2017, ArXiv.

[27]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[28]  Tassilo Klein,et al.  Pruning at a Glance: A Structured Class-Blind Pruning Technique for Model Compression , 2018 .

[29]  Hui Liu,et al.  On-Demand Deep Model Compression for Mobile Devices: A Usage-Driven Model Selection Framework , 2018, MobiSys.

[30]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[31]  Gernot Heiser,et al.  An Analysis of Power Consumption in a Smartphone , 2010, USENIX Annual Technical Conference.

[32]  Vivienne Sze,et al.  Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[34]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[35]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.