Deformable Part Models with CNN Features

In this work we report on progress in integrating deep convo-lutional features with Deformable Part Models (DPMs). We substitute the Histogram-of-Gradient features of DPMs with Convolutional Neural Network (CNN) features, obtained from the top-most, fifth, convolutional layer of Krizhevsky's network [8]. We demonstrate that we thereby obtain a substantial boost in performance (+14.5 mAP) when compared to the baseline HOG-based models. This only partially bridges the gap between DPMs and the currently top-performing R-CNN method of [4], suggesting that more radical changes to DPMs may be needed.

[1]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Jitendra Malik,et al.  Training Deformable Part Models with Decorrelated Features , 2013, 2013 IEEE International Conference on Computer Vision.

[3]  Forrest N. Iandola,et al.  DenseNet: Implementing Efficient ConvNet Descriptor Pyramids , 2014, ArXiv.

[4]  François Fleuret,et al.  Exact Acceleration of Linear Object Detectors , 2012, ECCV.

[5]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[10]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.