Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN

The dominant object detection approaches treat each dataset separately and fit towards a specific domain, which cannot adapt to other domains without extensive retraining. In this paper, we address the problem of designing a universal object detection model that exploits diverse category granularity from multiple domains and predict all kinds of categories in one system. Existing works treat this problem by integrating multiple detection branches upon one shared backbone network. However, this paradigm overlooks the crucial semantic correlations between multiple domains, such as categories hierarchy, visual similarity, and linguistic relationship. To address these drawbacks, we present a novel universal object detector called Universal-RCNN that incorporates graph transfer learning for propagating relevant semantic information across multiple datasets to reach semantic coherency. Specifically, we first generate a global semantic pool by integrating all high-level semantic representation of all the categories. Then an Intra-Domain Reasoning Module learns and propagates the sparse graph representation within one dataset guided by a spatial-aware GCN. Finally, an Inter-Domain Transfer Module is proposed to exploit diverse transfer dependencies across all domains and enhance the regional feature representation by attending and transferring semantic contexts globally. Extensive experiments demonstrate that the proposed method significantly outperforms multiple-branch models and achieves the state-of-the-art results on multiple object detection benchmarks (mAP: 49.1% on COCO).

[1]  Ke Gong,et al.  Look into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Shuicheng Yan,et al.  Human Parsing with Contextualized Convolutional Neural Network , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Martial Hebert,et al.  From Red Wine to Red Tomato: Composition with Context , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Tao Xiang,et al.  Joint Semantic and Latent Attribute Modelling for Cross-Class Transfer Learning , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Marios Savvides,et al.  Feature Selective Anchor-Free Module for Single-Shot Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Abhinav Gupta,et al.  Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Liang Lin,et al.  Hybrid Knowledge Routed Modules for Large-scale Object Detection , 2018, NeurIPS.

[10]  Ruslan Salakhutdinov,et al.  Learning Robust Visual-Semantic Embeddings , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Trevor Darrell,et al.  Learning to Segment Every Thing , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yueting Zhuang,et al.  DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection , 2015, IEEE Transactions on Image Processing.

[14]  Carsten Rother,et al.  Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[16]  Xiaodan Liang,et al.  Spatial-Aware Graph Relation Network for Large-Scale Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xiangyu Zhang,et al.  DetNet: A Backbone network for Object Detection , 2018, ArXiv.

[18]  Yichen Wei,et al.  Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[20]  Di He,et al.  FRAGE: Frequency-Agnostic Word Representation , 2018, NeurIPS.

[21]  Leonidas J. Guibas,et al.  Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[23]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Xinlei Chen,et al.  Spatial Memory for Context Reasoning in Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[26]  Ernest Valveny,et al.  Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Yuning Jiang,et al.  Unified Perceptual Parsing for Scene Understanding , 2018, ECCV.

[28]  Nikos Komodakis,et al.  Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Trevor Darrell,et al.  LSDA: Large Scale Detection through Adaptation , 2014, NIPS.

[30]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[32]  Zhiming Luo,et al.  Non-local Deep Features for Salient Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yang Song,et al.  Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jian Sun,et al.  Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jonathan Masci,et al.  Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Xiaodan Liang,et al.  Reasoning-RCNN: Unifying Adaptive Global Reasoning Into Large-Scale Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Xinlei Chen,et al.  Iterative Visual Reasoning Beyond Convolutions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Shiguang Shan,et al.  Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[41]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[42]  Min Bai,et al.  UPSNet: A Unified Panoptic Segmentation Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Ming-Hsuan Yang,et al.  DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection , 2015, IEEE Transactions on Image Processing.