Multiple-Instance Video Segmentation with Sequence-Specific Object Proposals

We present a novel approach to video segmentation which won the 4th place in DAVIS challenge 2017. The method has two main components: in the first part we extract video object proposals from each frame. We develop a new algorithm based on one-shot video segmentation (OSVOS) algorithm to generate sequence-specific proposals that match to the human-annotated proposals in the first frame. This set is populated by the proposals from fully convolutional instance-aware image segmentation algorithm (FCIS). Then, we use the segment proposal tracking (SPT) algorithm to track object proposals in time and generate the spatio-temporal video object proposals. This approach learns video segments by bootstrapping them from temporally consistent object proposals, which can start from any frame. We extend this approach with a semi-Markov motion model to provide appearance motion multi-target inference, backtracking a segment started from frame T to the 1st frame, and a ”re-tracking” capability that learns a better object appearance model after inference has been done. With a dense CRF refinement method, this model achieved 61.5% overall accuracy in DAVIS challenge 2017.

[1]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[2]  James M. Rehg,et al.  Video Segmentation by Tracking Many Figure-Ground Segments , 2013, 2013 IEEE International Conference on Computer Vision.

[3]  James M. Rehg,et al.  Robust video segment proposals with painless occlusion handling , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Ronan Collobert,et al.  Learning to Refine Object Segments , 2016, ECCV.

[6]  Yi Li,et al.  Fully Convolutional Instance-Aware Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Thomas Brox,et al.  Lucid Data Dreaming for Object Tracking , 2017, ArXiv.

[8]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Jordi Pont-Tuset,et al.  Convolutional Oriented Boundaries: From Image Segmentation to High-Level Tasks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.