Few-Shot Object Detection via Knowledge Transfer

Conventional methods for object detection usually require substantial amounts of training data and annotated bounding boxes. If there are only a few training data and annotations, the object detectors easily overfit and fail to generalize. It exposes the practical weakness of the object detectors. On the other hand, human can easily master new reasoning rules with only a few demonstrations using previously learned knowledge. In this paper, we introduce a few-shot object detection via knowledge transfer, which aims to detect objects from a few training examples. Central to our method is prototypical knowledge transfer with an attached meta-learner. The meta-learner takes support set images that include the few examples of the novel categories and base categories, and predicts prototypes that represent each category as a vector. Then, the prototypes reweight each RoI (Region-of-Interest) feature vector from a query image to remodels R-CNN predictor heads. To facilitate the remodeling process, we predict the prototypes under a graph structure, which propagates information of the correlated base categories to the novel categories with explicit guidance of prior knowledge that represents correlations among categories. Extensive experiments on the PASCAL VOC dataset verifies the effectiveness of the proposed method.

[1]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jihun Park,et al.  Accurate object contour tracking based on boundary edge selection , 2007, Pattern Recognit..

[3]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[4]  Abhinav Gupta,et al.  The More You Know: Using Knowledge Graphs for Image Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Xin Wang,et al.  Few-Shot Object Detection via Feature Reweighting , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[8]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[9]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[10]  Seong-Whan Lee,et al.  Facial component extraction and face recognition with support vector machines , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[11]  Hao Wang,et al.  Rethinking Knowledge Graph Propagation for Zero-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[13]  Zhiwu Lu,et al.  Large-Scale Few-Shot Learning: Knowledge Transfer With Class Hierarchy , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[15]  S-W Lee,et al.  Biologically Motivated Computer Vision , 2000, Lecture Notes in Computer Science.

[16]  Wei-Lun Chao,et al.  Synthesized Classifiers for Zero-Shot Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[19]  Seong-Whan Lee,et al.  Reconstruction of 3D human body pose from stereo image sequences based on top-down learning , 2007, Pattern Recognit..

[20]  Wei-Lun Chao,et al.  Classifier and Exemplar Synthesis for Zero-Shot Learning , 2018, International Journal of Computer Vision.

[21]  Liang Lin,et al.  Knowledge-Embedded Routing Network for Scene Graph Generation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[23]  Anil K. Jain,et al.  Nighttime face recognition at large standoff: Cross-distance and cross-spectral matching , 2014, Pattern Recognit..

[24]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Nikos Komodakis,et al.  Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Xiaodan Liang,et al.  Meta R-CNN: Towards General Solver for Instance-Level Low-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Shengcai Liao,et al.  Nighttime Face Recognition at Long Distance: Cross-Distance and Cross-Spectral Matching , 2012, ACCV.

[28]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[29]  Seong-Whan Lee,et al.  View-independent human action recognition with Volume Motion Template on single stereo camera , 2010, Pattern Recognit. Lett..

[30]  Abhinav Gupta,et al.  Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Anil K. Jain,et al.  Face Tracking and Recognition at a Distance: A Coaxial and Concentric PTZ Camera System , 2013, IEEE Transactions on Information Forensics and Security.