Hand-Crafted vs Deep Features: A Quantitative Study of Pedestrian Appearance Model

We propose a deep discriminative appearance model (DDAM) based on convolutional neural network (CNN) for pedestrians. The training stage of our supervised D-DAM model does not depend on a large amount of data. In our model, we introduce a progressive batch refinement technique to fine tune the CNN for modeling the appearance of the pedestrian. After fine-tuning, the model achieves 98% accuracy for pedestrian and non-pedestrian classification. Moreover, we also introduce a novel discrimination index (DI) for evaluating the spatio-temporal discrimination effectiveness of both hand-crafted and deep features. We perform experiments on pre-trained CNN model, our D-DAM model, and 3 baseline hand-crafted features including HoG, LBP, and Color histogram. The results show that our D-DAM model achieves higher classification accuracy and better spatio-temporal discrimination ability compared to all the hand-crafted features.

[1]  Cordelia Schmid,et al.  Long-Term Temporal Convolutions for Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[4]  Yong Xu,et al.  Image-based action recognition using hint-enhanced deep neural networks , 2017, Neurocomputing.

[5]  Yang Lu,et al.  Identification of rice diseases using deep convolutional neural networks , 2017, Neurocomputing.

[6]  Qingming Huang,et al.  Hedged Deep Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[8]  Francesco G. B. De Natale,et al.  Crowd behavior identification , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[9]  David Barber,et al.  Nesterov's accelerated gradient and momentum as approximations to regularised update descent , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[10]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[11]  Nicola Conci,et al.  Real-time anomaly detection in dense crowded scenes , 2014, Electronic Imaging.

[12]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Greg Mori,et al.  Deep Learning of Appearance Models for Online Object Tracking , 2018, ECCV Workshops.

[14]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Wenhan Luo,et al.  Multiple object tracking: A literature review , 2014, Artif. Intell..

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[19]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[20]  Fugen Zhou,et al.  Node-level parallelization for deep neural networks with conditional independent graph , 2017, Neurocomputing.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Changsheng Xu,et al.  Deep Relative Tracking , 2017, IEEE Transactions on Image Processing.

[24]  Faouzi Alaya Cheikh,et al.  A hierarchical feature model for multi-target tracking , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[25]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[26]  Zhenhua Guo,et al.  A Completed Modeling of Local Binary Pattern Operator for Texture Classification , 2010, IEEE Transactions on Image Processing.

[27]  Huchuan Lu,et al.  Dual Deep Network for Visual Tracking , 2016, IEEE Transactions on Image Processing.

[28]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[29]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[30]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[31]  Ting Liu,et al.  Recent advances in convolutional neural networks , 2015, Pattern Recognit..

[32]  Rynson W. H. Lau,et al.  CREST: Convolutional Residual Learning for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).