Teaching Where to See: Knowledge Distillation-Based Attentive Information Transfer in Vehicle Maker Classification

Deep neural networks (DNNs) have been applied to various fields and achieved high performances. However, they require significant computing resources because of their numerous parameters, even though some of those parameters are redundant and do not contribute to the DNN performance. Recently, to address this problem, many knowledge distillation-based methods have been proposed to compress a large DNN model into a small model. In this paper, we propose a novel knowledge distillation method that can compress a vehicle maker classification system based on a cascaded convolutional neural network (CNN) into a single CNN structure. The system uses mask regions with CNN features (Mask R-CNN) as a preprocessor for the vehicle region detection and has a structure to be used in conjunction with a CNN classifier. By the preprocessor, the classifier can receive the background-removed vehicle image, which allows the classifier to have more attention to the vehicle region. With this cascaded structure, the system can classify the vehicle makers at about 91% performance. Most of all, when we compress the system into a single CNN structure through the proposed knowledge distillation method, it demonstrates about 89% accuracy, in which only about 2% of the accuracy is lost. Our experimental results show that the proposed method is superior to the conventional knowledge distillation method in terms of performance transfer.

[1]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[3]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[4]  Zohreh Azimifar,et al.  Unsupervised Feature Learning Toward a Real-time Vehicle Make and Model Recognition , 2018, ArXiv.

[5]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[9]  Vineeth N. Balasubramanian,et al.  Deep Model Compression: Distilling Knowledge from Noisy Teachers , 2016, ArXiv.

[10]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[13]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[14]  Xiaoou Tang,et al.  A large-scale car dataset for fine-grained categorization and verification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[16]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[17]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Lester W. Mackey,et al.  Teacher-Student Compression with Generative Adversarial Networks , 2018, 1812.02271.

[19]  Igor Carron,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[20]  Zenghui Wang,et al.  Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review , 2017, Neural Computation.

[21]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Vishnu Naresh Boddeti,et al.  In Teacher We Trust: Learning Compressed Models for Pedestrian Detection , 2016, ArXiv.

[23]  Yu Zhou,et al.  Fine-Grained Vehicle Model Recognition Using A Coarse-to-Fine Convolutional Neural Network Architecture , 2017, IEEE Transactions on Intelligent Transportation Systems.

[24]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[25]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[26]  Zi Yang,et al.  Vehicle detection in intelligent transportation systems and its applications under varying environments: A review , 2017, Image Vis. Comput..

[27]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[28]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[29]  Hichem Frigui,et al.  Vehicle make and model recognition using local features and logo detection , 2016, 2016 International Symposium on Signal, Image, Video and Communications (ISIVC).

[30]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[31]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.