Deconvolution single shot multibox detector for supermarket commodity detection and classification

This paper proposes an image detection model to detect and classify supermarkets shelves’ commodity. Based on the principle of the features directly affects the accuracy of the final classification, feature maps are performed to combine high level features with bottom level features. Then set some fixed anchors on those feature maps, finally the label and the position of commodity is generated by doing a box regression and classification. In this work, we proposed a model named Deconvolutiuon Single Shot MultiBox Detector, we evaluated the model using 300 images photographed from real supermarket shelves. Followed the same protocol in other recent methods, the results showed that our model outperformed other baseline methods.

[1]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[5]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[9]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.