Dynamic Relevance Learning for Few-Shot Object Detection

Expensive bounding-box annotations have limited the development of object detection task. Thus, it is necessary to focus on more challenging task of few-shot object detection. It requires the detector to recognize objects of novel classes with only a few training samples. Nowadays, many existing popular methods based on meta-learning have achieved promising performance, such as Meta R-CNN series. However, only a single category of support data is used as the attention to guide the detecting of query images each time. Their relevance to each other remains unexploited. Moreover, a lot of recent works treat the support data and query images as independent branch without considering the relationship between them. To address this issue, we propose a dynamic relevance learning model, which utilizes the relationship between all support images and Region of Interest (RoI) on the query images to construct a dynamic graph convolutional network (GCN). By adjusting the prediction distribution of the base detector using the output of this GCN, the proposed model can guide the detector to improve the class representation implicitly. Comprehensive experiments have been conducted on Pascal VOC and MS-COCO dataset. The proposed model achieves the best overall performance, which shows its effectiveness of learning more generalized features. Our code is available at https://github.com/liuweijie19980216/DRL-for-FSOD.

[1]  Amos J. Storkey,et al.  Data Augmentation Generative Adversarial Networks , 2017, ICLR 2018.

[2]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[3]  Trevor Darrell,et al.  Frustratingly Simple Few-Shot Object Detection , 2020, ICML.

[4]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[5]  Deva Ramanan,et al.  Meta-Learning to Detect Rare Objects , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Yu-Chiang Frank Wang,et al.  Multi-label Zero-Shot Learning with Structured Knowledge Graphs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Xiaodan Liang,et al.  Meta R-CNN: Towards General Solver for Instance-Level Low-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Razvan Pascanu,et al.  Memory-based Parameter Adaptation , 2018, ICLR.

[9]  Shuqiang Jiang,et al.  Attribute-Guided Feature Learning for Few-Shot Image Recognition , 2021, IEEE Transactions on Multimedia.

[10]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Aniwat Phaphuangwittayakul,et al.  Fast Adaptive Meta-Learning for Few-Shot Image Generation , 2021, IEEE Transactions on Multimedia.

[12]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[13]  Gustavo Carneiro,et al.  Smart Mining for Deep Metric Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[15]  Martial Hebert,et al.  Low-Shot Learning from Imaginary Data , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Sharath Pankanti,et al.  RepMet: Representative-Based Metric Learning for Classification and Few-Shot Object Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[18]  Heng Tao Shen,et al.  One-Shot Image-to-Image Translation via Part-Global Learning With a Multi-Adversarial Framework , 2019, IEEE Transactions on Multimedia.

[19]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[20]  Hao Chen,et al.  LSTD: A Low-Shot Transfer Detector for Object Detection , 2018, AAAI.

[21]  Yaliang Li,et al.  Simple and Deep Graph Convolutional Networks , 2020, ICML.

[22]  Qiang Wu,et al.  Low-Rank Pairwise Alignment Bilinear Network For Few-Shot Fine-Grained Image Classification , 2019, IEEE Transactions on Multimedia.

[23]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[24]  Marcello Pelillo,et al.  The Group Loss for Deep Metric Learning , 2020, ECCV.

[25]  Renaud Marlet,et al.  Few-Shot Object Detection and Viewpoint Estimation for Objects in the Wild , 2020, ECCV.

[26]  Junyu Gao,et al.  Learning Dual-Pooling Graph Neural Networks for Few-Shot Video Classification , 2021, IEEE Transactions on Multimedia.

[27]  Xin Wang,et al.  Few-Shot Object Detection via Feature Reweighting , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[30]  Jiri Matas,et al.  Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Geoffrey E. Hinton,et al.  Visualizing non-metric similarities in multiple maps , 2011, Machine Learning.

[33]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[34]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[35]  Abhinav Gupta,et al.  Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Qi Qian,et al.  SoftTriple Loss: Deep Metric Learning Without Triplet Sampling , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  James T. Kwok,et al.  Generalizing from a Few Examples , 2019, ACM Comput. Surv..

[39]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[40]  Hao Wang,et al.  Rethinking Knowledge Graph Propagation for Zero-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Yu-Wing Tai,et al.  Few-Shot Object Detection With Attention-RPN and Multi-Relation Detector , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[45]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[46]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Wei Wu,et al.  Class-wise Dynamic Graph Convolution for Semantic Segmentation , 2020, ECCV.

[48]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[49]  Hwann-Tzong Chen,et al.  One-Shot Object Detection with Co-Attention and Co-Excitation , 2019, NeurIPS.

[50]  Lu Liu,et al.  Attribute Propagation Network for Graph Zero-Shot Learning , 2020, AAAI.

[51]  Bharath Hariharan,et al.  Low-Shot Visual Recognition by Shrinking and Hallucinating Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[52]  Qinghua Zheng,et al.  Semantics-Preserving Graph Propagation for Zero-Shot Object Detection , 2020, IEEE Transactions on Image Processing.

[53]  Yair Movshovitz-Attias,et al.  No Fuss Distance Metric Learning Using Proxies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).