Boosting Mobile CNN Inference through Semantic Memory

Human brains are known to be capable of speeding up visual recognition of repeatedly presented objects through faster memory encoding and accessing procedures on activated neurons. For the first time, we borrow and distill such a capability into a semantic memory design, namely SMTM, to improve on-device CNN inference. SMTM employs a hierarchical memory architecture to leverage the long-tail distribution of objects of interest, and further incorporates several novel techniques to put it into effects: (1) it encodes high-dimensional feature maps into low-dimensional, semantic vectors for low-cost yet accurate cache and lookup; (2) it uses a novel metric in determining the exit timing considering different layers' inherent characteristics; (3) it adaptively adjusts the cache size and semantic vectors to fit the scene dynamics. SMTM is prototyped on commodity CNN engine and runs on both mobile CPU and GPU. Extensive experiments on large-scale datasets and models show that SMTM can significantly speed up the model inference over standard approach (up to 2×) and prior cache designs (up to 1.5x), with acceptable accuracy loss.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  J. Bargh,et al.  Nature of Priming Effects on Categorization , 1985 .

[3]  Dongsu Han,et al.  NEMO: enabling neural-enhanced video streaming on commodity mobile devices , 2020, MobiCom.

[4]  D. Albarracín,et al.  From primed concepts to action: A meta-analysis of the behavioral effects of incidentally presented words. , 2016, Psychological bulletin.

[5]  Xin Wang,et al.  SkipNet: Learning Dynamic Routing in Convolutional Networks , 2017, ECCV.

[6]  Juheon Yi,et al.  EagleEye: wearable camera-based person identification in crowded urban spaces , 2020, MobiCom.

[7]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Xuanzhe Liu,et al.  Approximate Query Processing on Autonomous Cameras , 2019, ArXiv.

[9]  Dragomir Anguelov,et al.  Capturing Long-Tail Distributions of Object Subcategories , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Q. Tian,et al.  GLAD: Global-Local-Alignment Descriptor for Pedestrian Retrieval , 2017, ACM Multimedia.

[11]  Jay L. Devore,et al.  A Modern Introduction to Probability and Statistics: Understanding Why and How , 2006 .

[12]  P. Tibbetts :Cognitive Neuroscience: The Biology of the Mind , 2009 .

[13]  Fengyuan Xu,et al.  EMO: real-time emotion recognition from single-eye images for resource-constrained eyewear devices , 2020, MobiSys.

[14]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[15]  Don L. Scarborough,et al.  Accessing lexical memory: The transfer of word repetition effects across task and modality , 1979 .

[16]  Rajesh Krishna Balan,et al.  DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications , 2017, MobiSys.

[17]  Youngki Lee,et al.  Heimdall: mobile GPU coordination platform for augmented reality applications , 2020, MobiCom.

[18]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[19]  Shahriar Nirjon,et al.  Fast and scalable in-memory deep multitask learning via neural weight virtualization , 2020, MobiSys.

[20]  Joshua B. Tenenbaum,et al.  Learning to share visual appearance for multiclass object detection , 2011, CVPR 2011.

[21]  Hantao Yao,et al.  Deep Representation Learning With Part Loss for Person Re-Identification , 2017, IEEE Transactions on Image Processing.

[22]  Luca Benini,et al.  CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data , 2017, ICDSC.

[23]  Xuanzhe Liu,et al.  A First Look at Deep Learning Apps on Smartphones , 2018, WWW.

[24]  Bo Hu,et al.  FoggyCache: Cross-Device Approximate Computation Reuse , 2018, MobiCom.

[25]  Suren Jayasuriya,et al.  EVA²: Exploiting Temporal Redundancy in Live Computer Vision , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[26]  Xi Zhang,et al.  MDLdroidLite: A Release-and-Inhibit Control Approach to Resource-Efficient Deep Neural Networks on Mobile Devices , 2020, IEEE Transactions on Mobile Computing.

[27]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[28]  Xiao Zeng,et al.  MobileDeepPill: A Small-Footprint Mobile Deep Learning System for Recognizing Unconstrained Pill Images , 2017, MobiSys.

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[31]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[32]  Harrison Si,et al.  Handbook of Research Methods in Social and Personality Psychology: Author Index , 2013 .

[33]  Ilias Leontiadis,et al.  SPINN: synergistic progressive inference of neural networks over device and cloud , 2020, MobiCom.

[34]  Larry S. Davis,et al.  BlockDrop: Dynamic Inference Paths in Residual Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Xuanzhe Liu,et al.  DeepCache: Principled Cache for Mobile Deep Vision , 2017, MobiCom.

[36]  H. T. Kung,et al.  BranchyNet: Fast inference via early exiting from deep neural networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[37]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[38]  Yanzhi Wang,et al.  PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning , 2020, ASPLOS.

[39]  Daehyun Kim,et al.  μLayer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor-Friendly Quantization , 2019, EuroSys.

[40]  Li Zhang,et al.  Spatially Adaptive Computation Time for Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  J. Murphy The General Data Protection Regulation (GDPR) , 2018, Irish medical journal.

[42]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[43]  Nicholas D. Lane,et al.  DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware , 2017, MobiSys.

[44]  Shahram Izadi,et al.  SenseCam: A Retrospective Memory Aid , 2006, UbiComp.

[45]  Xiaoxiao Li,et al.  Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Jason Cong,et al.  Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[47]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[48]  Li Bai,et al.  Cosine Similarity Metric Learning for Face Verification , 2010, ACCV.

[49]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[50]  Hermann Ebbinghaus (1885) Memory: A Contribution to Experimental Psychology , 2013, Annals of Neurosciences.

[51]  Wencong Xiao,et al.  SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  T. Chartrand,et al.  The mind in the middle: A practical guide to priming and automaticity research. , 2000 .

[53]  Weight-Dependent Gates for Differentiable Neural Network Pruning , 2020, ECCV Workshops.

[54]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[55]  R. Henson Neuroimaging studies of priming , 2003, Progress in Neurobiology.

[56]  Xiangyu Zhang,et al.  MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).