Detecting Advertising Materials via Multi-Scale Instance Segmentation Network

In this paper, we introduce a completely new application in computer vision: detecting advertising materials in commodity pictures. Different from the conventional detection tasks, the success of advertising materials detection greatly relies on successfully capturing the semantic and contour information of candidate targets. In our work, we firstly adopt a fully convolutional instance segmentation network to capture the semantic information and link information of pixels. Secondly, ASPP module and multi-scale prediction structure are introduced to handle materials with various scales. Then, we jointly optimize the network with semantic loss, link loss, and contour loss, to obtain finer detection results. Finally, we provide a dataset specially targeting at advertising materials detection. Experiments on this dataset show the promising performance of the proposed method.

[1]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[2]  Xuelong Li,et al.  PixelLink: Detecting Scene Text via Instance Segmentation , 2018, AAAI.

[3]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[4]  Tong Yang,et al.  MetaAnchor: Learning to Detect Objects with Customized Anchors , 2018, NeurIPS.

[5]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Xiao Yang,et al.  TextContourNet: A Flexible and Effective Framework for Improving Scene Text Detection Architecture With a Multi-Task Cascade , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[9]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[10]  Mun-Cheon Kang,et al.  Parallel Feature Pyramid Network for Object Detection , 2018, ECCV.

[11]  Xiangyu Zhang,et al.  Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[13]  Kun Yu,et al.  DenseASPP for Semantic Segmentation in Street Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  George Papandreou,et al.  MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jon Oberlander,et al.  Australian Journal of Intelligent Information Processing Systems , 2006 .

[17]  Julien Mairal,et al.  BlitzNet: A Real-Time Deep Network for Scene Understanding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Ying Chen,et al.  M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network , 2018, AAAI.

[19]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[22]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.