TKD: Temporal Knowledge Distillation for Active Perception

Deep neural network-based methods have been proved to achieve outstanding performance on object detection and classification tasks. Despite the significant performance improvement using the deep structures, they still require prohibitive runtime to process images and maintain the highest possible performance for real-time applications. Observing the phenomenon that human visual system (HVS) relies heavily on the temporal dependencies among frames from the visual input to conduct recognition efficiently, we propose a novel framework dubbed as TKD: temporal knowledge distillation. This framework distills the temporal knowledge from a heavy neural network-based model over selected video frames (the perception of the moments) to a light-weight model. To enable the distillation, we put forward two novel procedures: 1) a Long-short Term Memory (LSTM)-based keyframe selection method; and 2) a novel teacher-bounded loss design. To validate our approach, we conduct comprehensive empirical evaluations using different object detection methods over multiple datasets including Youtube-Objects and Hollywood scene dataset. Our results show consistent improvement in accuracy-speed trad- offs for object detection over the frames of the dynamic scene, compared to other modern object recognition methods. It can maintain the desired accuracy with the throughput of around 220 images per second. Implementation: https://github.com/mfarhadi/TKD-Cloud.

[1]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[2]  Yezhou Yang,et al.  Convolutional Neural Networks: Ensemble Modeling, Fine-Tuning and Unsupervised Semantic Localization , 2017, J. Vis. Commun. Image Represent..

[3]  Yingyan Lou,et al.  Crossroads+ , 2019, ACM Trans. Cyber Phys. Syst..

[4]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[5]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[6]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[7]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[8]  Srikanth Saripalli,et al.  Drone Detection Using Depth Maps , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Mubarak Shah,et al.  Real-World Anomaly Detection in Surveillance Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[11]  Lucas Beyer,et al.  Detection- Tracking for Efficient Person Analysis: The DetTA Pipeline , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[13]  Yezhou Yang,et al.  Convolutional Neural Networks: Ensemble Modeling, Fine-Tuning and Unsupervised Semantic Localization , 2017, ArXiv.

[14]  Xin Ye,et al.  Active Object Perceiver: Recognition-Guided Policy Learning for Object Searching on Mobile Robots , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[16]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[17]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[18]  Zhiguo Cao,et al.  When Unsupervised Domain Adaptation Meets Tensor Representations , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Jitendra Malik,et al.  Cross Modal Distillation for Supervision Transfer , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yezhou Yang,et al.  A Novel Design of Adaptive and Hierarchical Convolutional Neural Networks using Partial Reconfiguration on FPGA , 2019, 2019 IEEE High Performance Extreme Computing Conference (HPEC).

[21]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[22]  Xin Ye,et al.  GAPLE: Generalizable Approaching Policy LEarning for Robotic Object Searching in Indoor Environment , 2018, IEEE Robotics and Automation Letters.

[23]  M. Webster,et al.  Visual adaptation: Neural, psychological and computational aspects , 2007, Vision Research.

[24]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[25]  Cordelia Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Trevor Darrell,et al.  Deep Domain Confusion: Maximizing for Domain Invariance , 2014, CVPR 2014.

[27]  Luc Van Gool,et al.  Dark Model Adaptation: Semantic Image Segmentation from Daytime to Nighttime , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[28]  Leila Bowman,et al.  in the office , 1961 .

[29]  Lakshmish Ramaswamy,et al.  Edge-Based Anomalous Sensor Placement Detection for Participatory Sensing of Urban Heat Islands , 2018, 2018 IEEE International Smart Cities Conference (ISC2).

[30]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[31]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[32]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Tony X. Han,et al.  Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[35]  Cordelia Schmid,et al.  Learning object class detectors from weakly annotated video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[37]  Luc Van Gool,et al.  Domain Adaptive Faster R-CNN for Object Detection in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[39]  Aviral Shrivastava,et al.  RIM: Robust Intersection Management for Connected Autonomous Vehicles , 2018, 2018 IEEE Real-Time Systems Symposium (RTSS).

[40]  Subhransu Maji,et al.  Adapting Models to Signal Degradation using Distillation , 2017, BMVC.

[41]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[42]  Lakshmish Ramaswamy,et al.  SCOUTS: A Smart Community Centric Urban Heat Monitoring Framework , 2018, ARIC@SIGSPATIAL.

[43]  M. Webster Visual Adaptation. , 2015, Annual review of vision science.

[44]  Gregory D. Hager,et al.  Learning convolutional action primitives for fine-grained action recognition , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[45]  Deva Ramanan,et al.  Online Model Distillation for Efficient Video Inference , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  黄耀春 The Pursuit of Happyness赏析 , 2010 .

[47]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[48]  Rakesh Mehta,et al.  Object detection at 200 Frames Per Second , 2018, ECCV Workshops.

[49]  Guang-Zhong Yang,et al.  Deep Learning for Health Informatics , 2017, IEEE Journal of Biomedical and Health Informatics.

[50]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Jie Tang,et al.  Enabling Deep Learning on IoT Devices , 2017, Computer.

[52]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[53]  Mehdi Kamal,et al.  POLAR: A Pipelined/Overlapped FPGA-Based LSTM Accelerator , 2020, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[54]  Georgios Fainekos,et al.  Worst-case Satisfaction of STL Specifications Using Feedforward Neural Network Controllers , 2019, ACM Trans. Embed. Comput. Syst..