Deep Learning And Interactivity For Video Rotoscoping

In this work we extend the idea of object co-segmentation [10] to perform interactive video segmentation. Our framework predicts the coordinates of vertices along the boundary of an object for two frames of a video simultaneously. The predicted vertices are interactive in nature and a user interaction on one frame assists the network to correct the predictions for both frames. We employ attention mechanism at the encoder stage and a simple combination network at the decoder stage which allows the network to perform this simultaneous correction efficiently. The framework is also robust to the distance between the two input frames as it can handle a distance of up to 50 frames in between the two inputs.We train our model on professional dataset, which consists pixel accurate annotations given by professional Roto artists. We test our model on DAVIS [15] and achieve state of the art results in both automatic and interactive mode surpassing Curve-GCN [11] and PolyRNN++ [1].

[1]  Hong Chen,et al.  Semantic Aware Attention Based Deep Object Co-segmentation , 2018, ACCV.

[2]  Michael J. Black,et al.  OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[3]  Cem Yuksel,et al.  Parameterization and applications of Catmull-Rom curves , 2011, Comput. Aided Des..

[4]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[5]  Andrew S. Glassner,et al.  Graphics Gems , 1990 .

[6]  Sanja Fidler,et al.  Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++ , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Philip J. Schneider,et al.  An algorithm for automatically fitting digitized curves , 1990 .

[9]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[10]  Leonie Moench,et al.  Graphics Gems I , 2016 .

[11]  Carsten Rother,et al.  Deep Object Co-Segmentation , 2018, ACCV.

[12]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  B. Leibe,et al.  PReMVOS : Proposal-generation , Refinement and Merging for the DAVIS Challenge on Video Object Segmentation 2018 , 2018 .

[14]  Pascal Poupart,et al.  Unsupervised Video Object Segmentation for Deep Reinforcement Learning , 2018, NeurIPS.

[15]  Sanja Fidler,et al.  Fast Interactive Object Annotation With Curve-GCN , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Luc Van Gool,et al.  Deep Extreme Cut: From Extreme Points to Object Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[18]  Sungeun Hong,et al.  Key Instance Selection for Unsupervised Video Object Segmentation , 2019, ArXiv.

[19]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[20]  Ning Xu,et al.  Fast User-Guided Video Object Segmentation by Interaction-And-Propagation Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Xiaoxiao Li,et al.  Video Object Segmentation with Joint Re-identification and Attention-Aware Mask Propagation , 2018, ECCV.