Domain Balanced Sampling and Iterative Search for Product Identification

This paper introduces our solution to the 1st Workshop on Multimodal Product Identification in Livestreaming and Watch and Buy Challenge, a real-world task in a live stream scene and is challenging due to factors such as lighting, occlusion, and cross-domain. We model this task as the object detection and image retrieval problem. In the whole pipeline, we mainly focus on the problem of cross-domain retrieval and propose a domain-balanced sampling method, which enhances the robustness of multi-domain retrieval. Besides, to eliminate the influence of irrelevant clothing, we propose an iterative cross-search strategy, which greatly improves the accuracy of matching. In addition, we also experiment with the exploitation of text information, including multimodal product classification and multimodal intent recognition. With the aforementioned method, we achieved an F1 score of 69.2% and finally achieve first place in the competition.

[1]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jiashi Feng,et al.  The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation , 2020, ECCV.

[4]  Matthew R. Scott,et al.  Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Menglong Zhu,et al.  Detect-To-Retrieve: Efficient Regional Aggregation for Image Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Simon Osindero,et al.  Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[10]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[11]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[12]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[13]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[14]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[15]  Yu Wang,et al.  Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets , 2020, ECCV.

[16]  Wei Jiang,et al.  Bag of Tricks and a Strong Baseline for Deep Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[17]  Yannis Avrithis,et al.  Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[19]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[21]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[22]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[24]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Sheng Tang,et al.  Overcoming Classifier Imbalance for Long-Tail Object Detection With Balanced Group Softmax , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Stella X. Yu,et al.  Large-Scale Long-Tailed Recognition in an Open World , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[28]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[29]  Jian Cheng,et al.  Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Luc Van Gool,et al.  Efficient Non-Maximum Suppression , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[32]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[33]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Yichen Wei,et al.  Circle Loss: A Unified Perspective of Pair Similarity Optimization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Victor S. Lempitsky,et al.  Aggregating Deep Convolutional Features for Image Retrieval , 2015, ArXiv.

[38]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[39]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[40]  Quoc V. Le,et al.  Learning Data Augmentation Strategies for Object Detection , 2019, ECCV.

[41]  Sheng Tang,et al.  Robust common visual pattern discovery using graph matching , 2013, J. Vis. Commun. Image Represent..

[42]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[43]  Albert Gordo,et al.  Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[44]  Yi Jiang,et al.  Learning to Segment the Tail , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Chen Huang,et al.  Deep Imbalanced Learning for Face Recognition and Attribute Prediction , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Giorgos Tolias,et al.  Fine-Tuning CNN Image Retrieval with No Human Annotation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[48]  Kai Chen,et al.  Seesaw Loss for Long-Tailed Instance Segmentation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Xingyi Zhou,et al.  Bottom-Up Object Detection by Grouping Extreme and Center Points , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Yongdong Zhang,et al.  Implicit Negative Sub-Categorization and Sink Diversion for Object Detection , 2018, IEEE Transactions on Image Processing.

[51]  Ross B. Girshick,et al.  LVIS: A Dataset for Large Vocabulary Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Xiaoou Tang,et al.  Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net , 2018, ECCV.

[53]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[54]  Sheng Tang,et al.  Object Localization Based on Proposal Fusion , 2017, IEEE Transactions on Multimedia.

[55]  Stephen Lin,et al.  Deformable ConvNets V2: More Deformable, Better Results , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[57]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.