Data augmentation of random grid-hiding for video object segmentation

Video object segmentation is an important field in computer vision. However, the challenges in video object segmentation such as background clutter, occlusion and edge ambiguity cannot be avoided. In addition, existing labeled video object segmentation datasets are limited in size, which prevents CNN models from reaching their full generalization capabilities. In this paper, we propose a novel approach, called random grid-hiding (RGH), to perform data augmentation. We divide the training image into several rectangular regions and hide some regions randomly during model training. Thus, the convolutional neural network automatically focuses on the discriminative parts of the image. When the most discriminative part of the image is hidden, it compels the network focus on the other related parts of the image. Further, occlusion images are randomly generated in various levels. More features can be obtained by random grid-hiding, which can effectively reduce the risk of overfitting. Our approach is an effective extension of the data augmentation (such as random cropping and random flipping), and leads to improved accuracy in the task of the video object segmentation method on DAVIS dataset. Our experimental results show that the proposed method is a stable and effective method for data augmentation.

[1]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  John W. Fisher,et al.  A Video Representation Using Temporal Superpixels , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Alberto L. Sangiovanni-Vincentelli,et al.  Counterexample-Guided Data Augmentation , 2018, IJCAI.

[4]  Michal Irani,et al.  Video Segmentation by Non-Local Consensus voting , 2014, BMVC.

[5]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[8]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[9]  Markus H. Gross,et al.  Fully Connected Object Proposals for Video Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Peter V. Gehler,et al.  Video Propagation Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Shengen Yan,et al.  Deep Image: Scaling up Image Recognition , 2015, ArXiv.

[12]  Gustavo Carneiro,et al.  A Bayesian Data Augmentation Approach for Learning Deep Models , 2017, NIPS.

[13]  Anil A. Bharath,et al.  A data augmentation methodology for training machine/deep learning gait recognition algorithms , 2016, BMVC.

[14]  Robinson Piramuthu,et al.  HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Tao Xiang,et al.  In Defence of Negative Mining for Annotating Weakly Labelled Data , 2012, ECCV.

[16]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  T. Tuytelaars,et al.  Weakly Supervised Object Detection with Posterior Regularization , 2014 .

[18]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Ivan Laptev,et al.  Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Kristen Grauman,et al.  Supervoxel-Consistent Foreground Propagation in Video , 2014, ECCV.

[21]  Takashi Matsubara,et al.  Data Augmentation Using Random Image Cropping and Patching for Deep CNNs , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[22]  Luc Van Gool,et al.  Semantically-Guided Video Object Segmentation , 2017, ArXiv.

[23]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[24]  Yi Yang,et al.  Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in Vitro , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Alexander Sorkine-Hornung,et al.  Bilateral Space Video Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[28]  Alon Faktor,et al.  Video Object Segmentation by Non-Local Consensus voting , 2014, BMVC 2014.

[29]  Ming-Hsuan Yang,et al.  SegFlow: Joint Learning for Video Object Segmentation and Optical Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  James M. Rehg,et al.  Video Segmentation by Tracking Many Figure-Ground Segments , 2013, 2013 IEEE International Conference on Computer Vision.

[31]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[32]  Bernt Schiele,et al.  Learning Video Object Segmentation from Static Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Tinne Tuytelaars,et al.  Weakly Supervised Detection with Posterior Regularization , 2014, BMVC.

[36]  R. Venkatesh Babu,et al.  SeamSeg: Video Object Segmentation Using Patch Seams , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Michael J. Black,et al.  Video Segmentation via Object Flow , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[39]  Vittorio Ferrari,et al.  Fast Object Segmentation in Unconstrained Video , 2013, 2013 IEEE International Conference on Computer Vision.

[40]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[41]  Nikos Komodakis,et al.  Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42]  Serge Andrianov,et al.  Comparison of Regularization Methods for ImageNet Classification with Deep Convolutional Neural Networks , 2014 .

[43]  Michał Grochowski,et al.  Data augmentation for improving deep learning in image classification problem , 2018, 2018 International Interdisciplinary PhD Workshop (IIPhDW).

[44]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[45]  Yong Jae Lee,et al.  Key-segments for video object segmentation , 2011, 2011 International Conference on Computer Vision.