论文信息 - LSTD: A Low-Shot Transfer Detector for Object Detection

LSTD: A Low-Shot Transfer Detector for Object Detection

Recent advances in object detection are mainly driven by deep learning with large-scale detection benchmarks. However, the fully-annotated training set is often limited for a target detection task, which may deteriorate the performance of deep detectors. To address this challenge, we propose a novel low-shot transfer detector (LSTD) in this paper, where we leverage rich source-domain knowledge to construct an effective target-domain detector with very few training examples. The main contributions are described as follows. First, we design a flexible deep architecture of LSTD to alleviate transfer difficulties in low-shot detection. This architecture can integrate the advantages of both SSD and Faster RCNN in a unified deep framework. Second, we introduce a novel regularized transfer learning framework for low-shot detection, where the transfer knowledge (TK) and background depression (BD) regularizations are proposed to leverage object knowledge respectively from source and target domains, in order to further enhance fine-tuning with a few target images. Finally, we examine our LSTD on a number of challenging low-shot detection experiments, where LSTD outperforms other state-of-the-art approaches. The results demonstrate that LSTD is a preferable deep detector for low-shot scenarios.

[1] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[2] Yi Yang,et al. Few-Shot Object Recognition from Machine-Labeled Web Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Cordelia Schmid,et al. Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Daan Wierstra,et al. One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.

[5] Deyu Meng,et al. Few-Example Object Detection with Model Communication , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.

[7] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[8] Ming-Hsuan Yang,et al. Weakly Supervised Object Localization with Progressive Domain Adaptation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11] Andrea Vedaldi,et al. Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Yunchao Wei,et al. Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13] Yi Yang,et al. Few-shot Object Detection , 2017, ArXiv.

[14] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[15] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[16] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[17] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[19] Ivan Laptev,et al. ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization , 2016, ECCV.

[20] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.

[21] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[22] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[23] Yuxing Tang,et al. Large Scale Semi-Supervised Object Detection Using Visual and Semantic Knowledge Transfer , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Trevor Darrell,et al. LSDA: Large Scale Detection through Adaptation , 2014, NIPS.

[25] Yang Wang,et al. Attention Networks for Weakly Supervised Object Localization , 2016, BMVC.

[26] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Luc Van Gool,et al. Weakly Supervised Cascaded Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[30] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[31] Yong Jae Lee,et al. Track and Transfer: Watching Videos to Simulate Strong Human Supervision for Weakly-Supervised Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Chong Wang,et al. Weakly Supervised Object Localization with Latent Category Learning , 2014, ECCV.

[33] Bharath Hariharan,et al. Low-Shot Visual Recognition by Shrinking and Hallucinating Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[34] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[35] Wenyu Liu,et al. Multiple Instance Detection Network with Online Instance Classifier Refinement , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[37] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.