Semantic Aware Attention Based Deep Object Co-segmentation

Object co-segmentation is the task of segmenting the same objects from multiple images. In this paper, we propose the Attention Based Object Co-Segmentation for object co-segmentation that utilize a novel attention mechanism in the bottleneck layer of the deep neural network for the selection of semantically related features. Furthermore, we take the benefit of attention learner and propose an algorithm to segment multi-input images in linear time complexity. Experiment results demonstrate that our model achieves state of the art performance on multiple datasets, with a significant reduction of computational time.

[1]  Yoichi Sato,et al.  Predicting Gaze in Egocentric Video by Learning Task-dependent Attention Transition , 2018, ECCV.

[2]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[3]  Chao Yang,et al.  Show, Attend, and Translate: Unsupervised Image Translation With Self-Regularization and Attention , 2018, IEEE Transactions on Image Processing.

[4]  Vladimir Kolmogorov,et al.  Object cosegmentation , 2011, CVPR 2011.

[5]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[6]  Xinlei Chen,et al.  Enriching Visual Knowledge Bases via Object Discovery and Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Chen Sun,et al.  VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Vladimir Kolmogorov,et al.  Cosegmentation Revisited: Models and Optimization , 2010, ECCV.

[11]  Kristen Grauman,et al.  Pixel Objectness , 2017, ArXiv.

[12]  Ce Liu,et al.  Unsupervised Joint Object Discovery and Segmentation in Internet Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Keita Higuchi,et al.  Temporal Localization and Spatial Segmentation of Joint Attention in Multiple First-Person Videos , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[14]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jongwook Choi,et al.  Supervising Neural Attention Models for Video Captioning by Human Gaze Data , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jianfei Cai,et al.  Image Co-segmentation via Saliency Co-fusion , 2016, IEEE Transactions on Multimedia.

[17]  Jianfei Cai,et al.  Automatic image co-segmentation using geometric mean saliency , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[18]  Kewei Tu,et al.  Structured Attentions for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Michal Irani,et al.  Co-segmentation by Composition , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[21]  Jean Ponce,et al.  Discriminative clustering for image co-segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Brejesh Lall,et al.  Object cosegmentation using deep Siamese network , 2018, ArXiv.

[23]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[24]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Tong Lu,et al.  Deep-dense Conditional Random Fields for Object Co-segmentation , 2017, IJCAI.

[27]  Alexander J. Smola,et al.  Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Feiping Nie,et al.  Object Co-segmentation via Graph Optimized-Flexible Manifold Ranking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Carsten Rother,et al.  Deep Object Co-Segmentation , 2018, ACCV.

[32]  Jiebo Luo,et al.  iCoseg: Interactive co-segmentation with intelligent scribble guidance , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Tao Mei,et al.  Multi-level Attention Networks for Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[36]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[37]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Arnold W. M. Smeulders,et al.  Tracking by Natural Language Specification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.