CenterMask: Real-Time Anchor-Free Instance Segmentation

We propose a simple yet efficient anchor-free instance segmentation, called CenterMask, that adds a novel spatial attention-guided mask (SAG-Mask) branch to anchor-free one stage object detector (FCOS) in the same vein with Mask R-CNN. Plugged into the FCOS object detector, the SAG-Mask branch predicts a segmentation mask on each box with the spatial attention map that helps to focus on informative pixels and suppress noise. We also present an improved backbone networks, VoVNetV2, with two effective strategies: (1) residual connection for alleviating the optimization problem of larger VoVNet \cite{lee2019energy} and (2) effective Squeeze-Excitation (eSE) dealing with the channel information loss problem of original SE. With SAG-Mask and VoVNetV2, we deign CenterMask and CenterMask-Lite that are targeted to large and small models, respectively. Using the same ResNet-101-FPN backbone, CenterMask achieves 38.3%, surpassing all previous state-of-the-art methods while at a much faster speed. CenterMask-Lite also outperforms the state-of-the-art by large margins at over 35fps on Titan Xp. We hope that CenterMask and VoVNetV2 can serve as a solid baseline of real-time instance segmentation and backbone network for various vision tasks, respectively. The Code is available at https://github.com/youngwanLEE/CenterMask.

[1]  Jongyoul Park,et al.  An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[3]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[4]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[5]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Xingyi Zhou,et al.  Bottom-Up Object Detection by Grouping Extreme and Center Points , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Stephen Lin,et al.  An Empirical Study of Spatial Attention Mechanisms in Deep Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yongchao Gong,et al.  Mask Scoring R-CNN , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jitendra Malik,et al.  ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Yong Jae Lee,et al.  YOLACT: Real-Time Instance Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Kai Chen,et al.  Hybrid Task Cascade for Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Yongjin Kwon,et al.  Vision‐based garbage dumping action detection for real‐world surveillance platform , 2019, ETRI Journal.

[18]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[22]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[23]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[24]  Alexander C. Berg,et al.  RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free , 2019, ArXiv.

[25]  Gang Sun,et al.  Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks , 2018, NeurIPS.

[26]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[27]  Dong Liu,et al.  High-Resolution Representations for Labeling Pixels and Regions , 2019, ArXiv.

[28]  Kaiming He,et al.  Rethinking ImageNet Pre-Training , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Xinlei Chen,et al.  TensorMask: A Foundation for Dense Object Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Youngjoo Jo,et al.  SC-FEGAN: Face Editing Generative Adversarial Network With User’s Sketch and Color , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Yuxing Peng,et al.  ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Zhaoxiang Zhang,et al.  Scale-Aware Trident Networks for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).