Domain invariant hierarchical embedding for grocery products recognition

Abstract Recognizing packaged grocery products based solely on appearance is still an open issue for modern computer vision systems due to peculiar challenges. Firstly, the number of different items to be recognized is huge (i.e., in the order of thousands) and rapidly changing over time. Moreover, there exist a significant domain shift between the images that should be recognized at test time, taken in stores by cheap cameras, and those available for training, usually just one or a few studio-quality images per product. We propose an end-to-end architecture comprising a GAN to address the domain shift at training time and a deep CNN trained on the samples generated by the GAN to learn an embedding of product images that enforces a hierarchy between product categories. At test time, we perform recognition by means of K-NN search against a database consisting of just one reference image per product. Experiments addressing recognition of products present in the training datasets as well as different ones unseen at training time show that our approach compares favorably to state-of-the-art methods on the grocery recognition task and generalize fairly well to similar ones.

[1]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Sanja Fidler,et al.  Lost Shopping! Monocular Localization in Large Indoor Spaces , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  H. A. Ananya,et al.  Deep Learning based Large Scale Visual Recommendation and Search for E-Commerce , 2017, ArXiv.

[5]  Bharath Hariharan,et al.  Low-Shot Visual Recognition by Shrinking and Hallucinating Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[7]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[10]  Christian Floerkemeier,et al.  Recognizing Products: A Per-exemplar Multi-label Image Classification Approach , 2014, ECCV.

[11]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[14]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[15]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[16]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[17]  Giovanni Maria Farinella,et al.  Market basket analysis from egocentric videos , 2018, Pattern Recognit. Lett..

[18]  Tao Mei,et al.  Deep Semantic Hashing with Generative Adversarial Networks , 2017, SIGIR.

[19]  Davide Maltoni,et al.  Grocery product detection and recognition , 2017, Expert Syst. Appl..

[20]  Svetlana Lazebnik,et al.  Where to Buy It: Matching Street Clothing Photos in Online Shops , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[22]  Chenxi Liu,et al.  ScaleNet: Guiding Object Proposal Generation in Supermarkets and Beyond , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[24]  Asaf Tzadok,et al.  Fine-Grained Recognition of Thousands of Object Categories with Single-Example Training , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Shih-Fu Chang,et al.  Learning Spread-Out Local Feature Descriptors , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Luigi di Stefano,et al.  Product Recognition in Store Shelves as a Sub-Graph Isomorphism Problem , 2017, ICIAP.

[28]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[29]  Roberto Pierdicca,et al.  Robust and affordable retail customer profiling by vision and radio beacon sensor fusion , 2016, Pattern Recognit. Lett..

[30]  Kristen Grauman,et al.  Connecting the Dots with Landmarks: Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation , 2013, ICML.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Müjdat Çetin,et al.  Context-aware hybrid classification system for fine-grained retail product recognition , 2016, 2016 IEEE 12th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP).

[33]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[34]  Yu Zhou,et al.  Matching User Photos to Online Products with Robust Deep Features , 2016, ICMR.

[35]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[37]  Emanuele Frontoni,et al.  Mobile robot for retail surveying and inventory using visual and textual analysis of monocular pictures based on deep learning , 2017, 2017 European Conference on Mobile Robots (ECMR).

[38]  Narayanan Vijaykrishnan,et al.  A hardware accelerated multilevel visual classifier for embedded visual-assist systems , 2014, 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[39]  Michele Merler,et al.  Recognizing Groceries in situ Using in vitro Training Data , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[41]  Albert Gordo,et al.  Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[42]  Alexander J. Smola,et al.  Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Narayanan Vijaykrishnan,et al.  Visual co-occurrence network: using context for large-scale object recognition in retail , 2015, 2015 13th IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia).

[44]  Feng Zhou,et al.  Embedding Label Structures for Fine-Grained Feature Representation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).