Recognizing Products: A Per-exemplar Multi-label Image Classification Approach

Large-scale instance-level image retrieval aims at retrieving specific instances of objects or scenes. Simultaneously retrieving multiple objects in a test image adds to the difficulty of the problem, especially if the objects are visually similar. This paper presents an efficient approach for per-exemplar multi-label image classification, which targets the recognition and localization of products in retail store images. We achieve runtime efficiency through the use of discriminative random forests, deformable dense pixel matching and genetic algorithm optimization. Cross-dataset recognition is performed, where our training images are taken in ideal conditions with only one single training image per product label, while the evaluation set is taken using a mobile phone in real-life scenarios in completely different conditions. In addition, we provide a large novel dataset and labeling tools for products image search, to motivate further research efforts on multi-label retail products image classification. The proposed approach achieves promising results in terms of both accuracy and runtime efficiency on 680 annotated images of our dataset, and 885 test images of GroZi-120 dataset. We make our dataset of 8350 different product images and the 680 test images from retail stores with complete annotations available to the wider community.

[1]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[2]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[3]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[4]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[5]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[6]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[7]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Rong Jin,et al.  Correlated Label Propagation with Application to Multi-label Learning , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[11]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[12]  Michele Merler,et al.  Recognizing Groceries in situ Using in vitro Training Data , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.

[14]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[15]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Xiaofan Lin,et al.  Visual search engine for product images , 2008, Electronic Imaging.

[17]  Shumeet Baluja,et al.  Pagerank for product image search , 2008, WWW.

[18]  Víctor Robles,et al.  Feature selection for multi-label naive Bayes classification , 2009, Inf. Sci..

[19]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[20]  Shihong Lao,et al.  Boosting Associated Pairing Comparison Features for pedestrian detection , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[21]  Kusum Deep,et al.  A real coded genetic algorithm for solving integer and mixed integer optimization problems , 2009, Appl. Math. Comput..

[22]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Serge J. Belongie,et al.  Toward real-time grocery detection for the visually impaired , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[25]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Bernd Girod,et al.  Mobile product recognition , 2010, ACM Multimedia.

[27]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[28]  Pietro Perona,et al.  Visual Recognition with Humans in the Loop , 2010, ECCV.

[29]  Thomas Deselaers,et al.  ClassCut for Unsupervised Class Segmentation , 2010, ECCV.

[30]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[31]  Fei-Fei Li,et al.  Combining randomization and discrimination for fine-grained image categorization , 2011, CVPR 2011.

[32]  Svetlana Lazebnik,et al.  Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[33]  Ying Wu,et al.  Mobile Product Image Search by Automatic Query Object Extraction , 2012, ECCV.

[34]  Matthieu Guillaumin,et al.  Segmentation Propagation in ImageNet , 2012, ECCV.

[35]  Frédéric Jurie,et al.  Improving Image Classification Using Semantic Attributes , 2012, International Journal of Computer Vision.

[36]  Cordelia Schmid,et al.  Discriminative spatial saliency for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Ce Liu,et al.  Deformable Spatial Pyramid Matching for Fast Dense Correspondences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.