论文信息 - Multiple-Instance Video Segmentation with Sequence-Specific Object Proposals

Multiple-Instance Video Segmentation with Sequence-Specific Object Proposals

We present a novel approach to video segmentation which won the 4th place in DAVIS challenge 2017. The method has two main components: in the first part we extract video object proposals from each frame. We develop a new algorithm based on one-shot video segmentation (OSVOS) algorithm to generate sequence-specific proposals that match to the human-annotated proposals in the first frame. This set is populated by the proposals from fully convolutional instance-aware image segmentation algorithm (FCIS). Then, we use the segment proposal tracking (SPT) algorithm to track object proposals in time and generate the spatio-temporal video object proposals. This approach learns video segments by bootstrapping them from temporally consistent object proposals, which can start from any frame. We extend this approach with a semi-Markov motion model to provide appearance motion multi-target inference, backtracking a segment started from frame T to the 1st frame, and a ”re-tracking” capability that learns a better object appearance model after inference has been done. With a dense CRF refinement method, this model achieved 61.5% overall accuracy in DAVIS challenge 2017.

[1] Vladlen Koltun,et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[2] James M. Rehg,et al. Video Segmentation by Tracking Many Figure-Ground Segments , 2013, 2013 IEEE International Conference on Computer Vision.

[3] James M. Rehg,et al. Robust video segment proposals with painless occlusion handling , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Luc Van Gool,et al. A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Ronan Collobert,et al. Learning to Refine Object Segments , 2016, ECCV.

[6] Yi Li,et al. Fully Convolutional Instance-Aware Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Thomas Brox,et al. Lucid Data Dreaming for Object Tracking , 2017, ArXiv.

[8] Luc Van Gool,et al. One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Kaiming He,et al. Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10] Jordi Pont-Tuset,et al. Convolutional Oriented Boundaries: From Image Segmentation to High-Level Tasks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.