In Teacher We Trust: Learning Compressed Models for Pedestrian Detection

Deep convolutional neural networks continue to advance the state-of-the-art in many domains as they grow bigger and more complex. It has been observed that many of the parameters of a large network are redundant, allowing for the possibility of learning a smaller network that mimics the outputs of the large network through a process called Knowledge Distillation. We show, however, that standard Knowledge Distillation is not effective for learning small models for the task of pedestrian detection. To improve this process, we introduce a higher-dimensional hint layer to increase information flow. We also estimate the variance in the outputs of the large network and propose a loss function to incorporate this uncertainty. Finally, we attempt to boost the complexity of the small network without increasing its size by using as input hand-designed features that have been demonstrated to be effective for pedestrian detection. We succeed in training a model that contains $400\times$ fewer parameters than the large network while outperforming AlexNet on the Caltech Pedestrian Dataset.

[1]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2]  Yixin Chen,et al.  Compressing Convolutional Neural Networks , 2015, ArXiv.

[3]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[4]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[5]  Shih-Fu Chang,et al.  An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Song Han,et al.  A Deep Neural Network Compression Pipeline: Pruning, Quantization, Huffman Encoding , 2015 .

[8]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[9]  Ivan Oseledets,et al.  Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..

[10]  Bernt Schiele,et al.  Ten Years of Pedestrian Detection, What Have We Learned? , 2014, ECCV Workshops.

[11]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[12]  Pietro Perona,et al.  Integral Channel Features , 2009, BMVC.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Marian Verhelst,et al.  Energy-efficient ConvNets through approximate computing , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[15]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[16]  Guido Sanguinetti,et al.  Advances in Neural Information Processing Systems 24 , 2011 .

[17]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[18]  Alexander Novikov,et al.  Tensorizing Neural Networks , 2015, NIPS.

[19]  Liang Lin,et al.  Is Faster R-CNN Doing Well for Pedestrian Detection? , 2016, ECCV.

[20]  Gregory J. Wolff,et al.  Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22]  Shuicheng Yan,et al.  Scale-Aware Fast R-CNN for Pedestrian Detection , 2015, IEEE Transactions on Multimedia.

[23]  Bernt Schiele,et al.  Filtered channel features for pedestrian detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[25]  R. Venkatesh Babu,et al.  Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.

[26]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[27]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[30]  Bernt Schiele,et al.  Taking a deeper look at pedestrians , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Pietro Perona,et al.  Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.