Deep Variational Instance Segmentation

Instance Segmentation, which seeks to obtain both class and instance labels for each pixel in the input image, is a challenging task in computer vision. State-of-the-art algorithms often employ two separate stages, the first one generating object proposals and the second one recognizing and refining the boundaries. Further, proposals are usually based on detectors such as faster R-CNN which search for boxes in the entire image exhaustively. In this paper, we propose a novel algorithm that directly utilizes a fully convolutional network (FCN) to predict instance labels. Specifically, we propose a variational relaxation of instance segmentation as minimizing an optimization functional for a piecewise-constant segmentation problem, which can be used to train an FCN end-to-end. It extends the classical Mumford-Shah variational segmentation problem to be able to handle permutation-invariant labels in the ground truth of instance segmentation. Experiments on PASCAL VOC 2012, Semantic Boundaries dataset(SBD), and the MSCOCO 2017 dataset show that the proposed approach efficiently tackle the instance segmentation task. The source code and trained models will be released with the paper.

[1]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[3]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[4]  Daniel Cremers,et al.  Real-Time Minimization of the Piecewise Smooth Mumford-Shah Functional , 2014, ECCV.

[5]  Sanja Fidler,et al.  Instance-Level Segmentation for Autonomous Driving with Deep Densely Connected MRFs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Donghoon Lee,et al.  Individualness and Determinantal Point Processes for Pedestrian Detection , 2016, ECCV.

[7]  Philip H. S. Torr,et al.  Recurrent Instance Segmentation , 2015, ECCV.

[8]  Sanja Fidler,et al.  SGN: Sequential Grouping Networks for Instance Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Sanja Fidler,et al.  Monocular Object Instance Segmentation and Depth Ordering with CNNs , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  XuYi,et al.  Image smoothing via L0 gradient minimization , 2011 .

[12]  Tao Kong,et al.  SOLOv2: Dynamic, Faster and Stronger , 2020, ArXiv.

[13]  Philip H. S. Torr,et al.  Pixelwise Instance Segmentation with a Dynamically Instantiated Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Peng Wang,et al.  Semantic Instance Segmentation via Deep Metric Learning , 2017, ArXiv.

[15]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[16]  Shu Kong,et al.  Recurrent Pixel Embedding for Instance Grouping , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[19]  Trevor Darrell,et al.  Learning Detection with Diverse Proposals , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  George Papandreou,et al.  MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Tao Kong,et al.  SOLOv2: Dynamic and Fast Instance Segmentation , 2020, NeurIPS.

[22]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[23]  Thomas Brox,et al.  Pixel-Level Encoding and Depth Layering for Instance-Level Semantic Labeling , 2016, GCPR.

[24]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Chunhua Shen,et al.  PolarMask: Single Shot Instance Segmentation With Polar Representation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Gabriela Csurka,et al.  What is a good evaluation measure for semantic segmentation? , 2013, BMVC.

[27]  Jonathan T. Barron,et al.  A General and Adaptive Robust Loss Function , 2017, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Yong Jae Lee,et al.  YOLACT++: Better Real-time Instance Segmentation , 2020, IEEE transactions on pattern analysis and machine intelligence.

[29]  Yi Li,et al.  Fully Convolutional Instance-Aware Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  D. Mumford,et al.  Optimal approximations by piecewise smooth functions and associated variational problems , 1989 .

[31]  Yuning Jiang,et al.  SOLO: Segmenting Objects by Locations , 2020, ECCV.

[32]  Kai Chen,et al.  MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.

[33]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[34]  Thomas Brox,et al.  Box2Pix: Single-Shot Instance Segmentation by Assigning Pixels to Object Boxes , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[35]  Tony F. Chan,et al.  A Multiphase Level Set Framework for Image Segmentation Using the Mumford and Shah Model , 2002, International Journal of Computer Vision.

[36]  Jian Sun,et al.  Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Min Bai,et al.  Deep Watershed Transform for Instance Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Carsten Rother,et al.  InstanceCut: From Edges to Instances with MultiCut , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Luc Van Gool,et al.  The 2017 DAVIS Challenge on Video Object Segmentation , 2017, ArXiv.

[40]  Yongchao Gong,et al.  Mask Scoring R-CNN , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  F. Moldoveanu,et al.  Image segmentation based on active contours without edges , 2012, 2012 IEEE 8th International Conference on Intelligent Computer Communication and Processing.

[42]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[44]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[45]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[46]  Jitendra Malik,et al.  Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation. , 2017, IEEE transactions on pattern analysis and machine intelligence.

[47]  Christopher V. Alvino,et al.  Reformulating and Optimizing the Mumford-Shah Functional on a Graph - A Faster, Lower Energy Solution , 2008, ECCV.

[48]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Mila Nikolova,et al.  Algorithms for Finding Global Minimizers of Image Segmentation and Denoising Models , 2006, SIAM J. Appl. Math..

[51]  Yong Jae Lee,et al.  YOLACT: Real-Time Instance Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[53]  Richard S. Zemel,et al.  End-to-End Instance Segmentation and Counting with Recurrent Attention , 2016, ArXiv.