Object Detection with Task Description Only

Deep Convolutional Neural Network models achieve state-of-the-art performance on object detection, but it is driven by large labeled data sets. For general practical object detection problems, the existing public data set is no longer applicable, and there is no ready-made data set to solve the problem, usually only the description of object detection task is available. The method of collecting data sets from real world is really time consuming and unreachable in some special scenarios. In this paper, we define this problem as a new object detection problem named Task Description Object Detection, and present a synthetic image-based method to solve this problem. We design a system to generate annotated synthetic images quickly and inexpensively according to the task description. What's more, we propose a novel layer called SOMConv for object detection network to adapt the object detection model trained with the synthetic data sets to detection in real world. Our experiments evidence the effectiveness of our approach on real-world object detection problems, which only has the description of the object detection problem and no real images.

[1]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[2]  Donald A. Adjeroh,et al.  Unified Deep Supervised Domain Adaptation and Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[4]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[6]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Luc Van Gool,et al.  ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Yi Zhang,et al.  UnrealCV: Virtual Worlds for Computer Vision , 2017, ACM Multimedia.

[10]  Luc Van Gool,et al.  Deep Domain Adaptation by Geodesic Distance Minimization , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[11]  Yeongjae Cheon,et al.  PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection , 2016, ArXiv.

[12]  Kiyoharu Aizawa,et al.  Cross-Domain Weakly-Supervised Object Detection Through Progressive Domain Adaptation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Luc Van Gool,et al.  Domain Adaptive Faster R-CNN for Object Detection in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[16]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[17]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[18]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[19]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[20]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[22]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[23]  Neil A. Dodgson,et al.  Rendering synthetic ground truth images for eye tracker evaluation , 2014, ETRA.

[24]  Kristen Grauman,et al.  What's it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.