MetaSearch: Incremental Product Search via Deep Meta-Learning

With the advancement of image processing and computer vision technology, content-based product search is applied in a wide variety of common tasks, such as online shopping, automatic checkout systems, and intelligent logistics. Given a product image as a query, existing product search systems mainly perform the retrieval process using predefined databases with fixed product categories. However, real-world applications often require inserting new categories or updating existing products in the product database. When using existing product search methods, the image feature extraction models must be retrained and database indexes must be rebuilt to accommodate the updated data, and these operations incur high costs for data annotation and training time. To this end, we propose a few-shot incremental product search framework with meta-learning, which requires very few annotated images and has a reasonable training time. In particular, our framework contains a multipooling-based product feature extractor that learns a discriminative representation for each product, and we also design a meta-learning-based feature adapter to guarantee the robustness of the few-shot features. Furthermore, when expanding new categories in batches during a product search, we reconstruct the few-shot features by using an incremental weight combiner to accommodate the incremental search task. Through extensive experiments, we demonstrate that the proposed framework achieves excellent performance for new products while still guaranteeing the high search accuracy of the base categories after gradually expanding new product categories without forgetting.

[1]  Matthew A. Brown,et al.  Low-Shot Learning with Imprinted Weights , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Liqiang Nie,et al.  Scalable Deep Hashing for Large-Scale Social Image Retrieval , 2020, IEEE Transactions on Image Processing.

[3]  Tao Mei,et al.  Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Lei Yang,et al.  RPC: A Large-Scale Retail Product Checkout Dataset , 2019, ArXiv.

[5]  Razvan Pascanu,et al.  Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[6]  Philip H. S. Torr,et al.  Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence , 2018, ECCV.

[7]  Bharath Hariharan,et al.  Low-Shot Visual Recognition by Shrinking and Hallucinating Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Xuelong Li,et al.  Discrete Spectral Hashing for Efficient Similarity Retrieval , 2019, IEEE Transactions on Image Processing.

[9]  Debasmit Das,et al.  A Two-Stage Approach to Few-Shot Learning for Image Recognition , 2019, IEEE Transactions on Image Processing.

[10]  Yongdong Zhang,et al.  Listen, look, and gotcha: instant video search with mobile phones by layered audio-video indexing , 2013, ACM Multimedia.

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Qi Tian,et al.  Coupled Binary Embedding for Large-Scale Image Retrieval , 2014, IEEE Transactions on Image Processing.

[14]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[15]  Subhransu Maji,et al.  Meta-Learning With Differentiable Convex Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Wu Liu,et al.  Generalized zero-shot learning for action recognition with web-scale video data , 2017, World Wide Web.

[17]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[18]  Yue Gao,et al.  Multi-View 3D Object Retrieval With Deep Embedding Network , 2016, IEEE Transactions on Image Processing.

[19]  Yue Gao,et al.  Zero-Shot Learning With Transferred Samples , 2017, IEEE Transactions on Image Processing.

[20]  Jacques Wainer,et al.  Automatic fruit and vegetable classification from images , 2010 .

[21]  Meng Wang,et al.  Beyond Object Proposals: Random Crop Pooling for Multi-Label Image Recognition , 2016, IEEE Transactions on Image Processing.

[22]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[23]  Qi Tian,et al.  SIFT Meets CNN: A Decade Survey of Instance Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Christian Floerkemeier,et al.  Recognizing Products: A Per-exemplar Multi-label Image Classification Approach , 2014, ECCV.

[25]  Suzhen Wang,et al.  Fine-Grained Grocery Product Recognition by One-Shot Learning , 2018, ACM Multimedia.

[26]  Martial Hebert,et al.  Low-Shot Learning from Imaginary Data , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Artem Babenko,et al.  Non-metric Similarity Graphs for Maximum Inner Product Search , 2018, NeurIPS.

[28]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Debasmit Das,et al.  Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibration , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[31]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[32]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[34]  Yu Hao,et al.  Take Goods from Shelves: A Dataset for Class-Incremental Object Detection , 2019, ICMR.

[35]  Nikos Komodakis,et al.  Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Mohan S. Kankanhalli,et al.  Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[38]  Markus Ulrich,et al.  MVTec D2S: Densely Segmented Supermarket Dataset , 2018, ECCV.

[39]  Kai Xu,et al.  Improving cross-dimensional weighting pooling with multi-scale feature fusion for image retrieval , 2019, Neurocomputing.

[40]  Matthijs Douze,et al.  Low-Shot Learning with Large-Scale Diffusion , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Mohan S. Kankanhalli,et al.  Multi-modal Preference Modeling for Product Search , 2018, ACM Multimedia.

[42]  Meng Wang,et al.  Coherent Semantic-Visual Indexing for Large-Scale Image Retrieval in the Cloud , 2017, IEEE Transactions on Image Processing.

[43]  Fei-Fei Li,et al.  Label Efficient Learning of Transferable Representations acrosss Domains and Tasks , 2017, NIPS.

[44]  Qi Tian,et al.  An End-to-End Architecture for Class-Incremental Object Detection with Knowledge Distillation , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[45]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Giorgos Tolias,et al.  Fine-Tuning CNN Image Retrieval with No Human Annotation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[49]  Wei Shen,et al.  Few-Shot Image Recognition by Predicting Parameters from Activations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Yi Yang,et al.  Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering , 2018, ACM Multimedia.

[51]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[52]  Simon Osindero,et al.  Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[53]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[54]  Kai Xu,et al.  Beauty Product Image Retrieval Based on Multi-Feature Fusion and Feature Aggregation , 2018, ACM Multimedia.

[55]  Rama Chellappa,et al.  Learning Without Memorizing , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).