Context-sensitive zero-shot semantic segmentation model based on meta-learning

Abstract The zero-shot semantic segmentation requires models with a strong image understanding ability. The majority of current solutions are based on direct mapping or generation. These schemes are effective in dealing with the zero-shot recognition, but they cannot fully transfer the visual dependence between objects in more complex scenarios of semantic segmentation. More importantly, the predicted results become seriously biased to the seen-category in the training set, which makes it difficult to accurately recognize the unseen-category. In view of the above two problems, we propose a novel zero-shot semantic segmentation model based on meta-learning. It is observed that the pure semantic space expression has certain limitations for the zero-shot learning. Therefore, based on the original semantic migration, we first migrate the shared information in the visual space by adding a context-module, and then migrate it in the visual and semantic dual space. At the same time, in order to solve the problem of biasness, we improve the adaptability of the model parameters by adjusting the parameters of the dual-space through the meta-learning, so that it can successfully complete the segmentation even in the face of new categories without reference samples. Experiments show that our algorithm outperforms the existing best methods in the zero-shot segmentation on three datasets of Pascal-VOC 2012, Pascal-Context and Coco-stuff.

[1]  Philip S. Yu,et al.  Generative Dual Adversarial Network for Generalized Zero-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Wei-Lun Chao,et al.  Predicting Visual Exemplars of Unseen Classes for Zero-Shot Learning , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Jiebo Luo,et al.  Self-Supervised Domain-Aware Generative Network for Generalized Zero-Shot Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[6]  Matthieu Cord,et al.  Zero-Shot Semantic Segmentation , 2019, NeurIPS.

[7]  Venkatesh Saligrama,et al.  Don’t Even Look Once: Synthesizing Features for Zero-Shot Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Yunchao Wei,et al.  Consistent Structural Relation Learning for Zero-Shot Segmentation , 2020, NeurIPS.

[12]  Piyush Rai,et al.  Meta-Learning for Generalized Zero-Shot Learning , 2020, AAAI.

[13]  Yang Yang,et al.  CANZSL: Cycle-Consistent Adversarial Networks for Zero-Shot Learning from Natural Language , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[14]  Soma Biswas,et al.  Generative Model with Semantic Embedding and Integrated Classifier for Generalized Zero-Shot Learning , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[15]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[17]  Ling Shao,et al.  Out-Of-Distribution Detection for Generalized Zero-Shot Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Kate Saenko,et al.  Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation , 2020, NeurIPS.

[19]  Bernt Schiele,et al.  Feature Generating Networks for Zero-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Sicheng Zhao,et al.  Zero-Shot Emotion Recognition via Affective Structural Embedding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[22]  Bernt Schiele,et al.  Latent Embeddings for Zero-Shot Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[24]  Bernt Schiele,et al.  Semantic Projection Network for Zero- and Few-Label Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Shu Yang,et al.  Attribute Driven Zero-Shot Classification and Segmentation , 2018, 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[26]  Li Liu,et al.  A Joint Generative Model for Zero-Shot Learning , 2018, ECCV Workshops.

[27]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Jian Yang,et al.  Learning the Redundancy-Free Features for Generalized Zero-Shot Object Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Shaogang Gong,et al.  Semantic Autoencoder for Zero-Shot Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[31]  Bernt Schiele,et al.  Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Toshihiko Yamasaki,et al.  Zero-Shot Semantic Segmentation via Variational Mapping , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[33]  Jiahui Liu,et al.  CSENet: Cascade Semantic Erasing Network for Weakly-Supervised Semantic Segmentation , 2020 .

[34]  Tao Xiang,et al.  Learning a Deep Embedding Model for Zero-Shot Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jianmin Jiang,et al.  Conditional Coupled Generative Adversarial Networks for Zero-Shot Domain Adaptation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Cordelia Schmid,et al.  Label-Embedding for Image Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Nam Ik Cho,et al.  Meta-Transfer Learning for Zero-Shot Super-Resolution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Vittorio Ferrari,et al.  COCO-Stuff: Thing and Stuff Classes in Context , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Mohammed Bennamoun,et al.  Atrous convolutional feature network for weakly supervised semantic segmentation , 2021, Neurocomputing.

[40]  Jianmin Wang,et al.  Transductive Zero-Shot Recognition via Shared Model Space Learning , 2016, AAAI.

[41]  Zhongfei Zhang,et al.  Transductive Zero-Shot Learning With a Self-Training Dictionary Approach , 2017, IEEE Transactions on Cybernetics.

[42]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[43]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[45]  Zhiwei Xiong,et al.  Tracking by Instance Detection: A Meta-Learning Approach , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Seyed-Ahmad Ahmadi,et al.  V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[47]  Bohyung Han,et al.  Real-Time Object Tracking via Meta-Learning: Efficient Model Adaptation and One-Shot Channel Pruning , 2019, AAAI.

[48]  Yi Yang,et al.  Exploring Semantic Inter-Class Relationships (SIR) for Zero-Shot Action Recognition , 2015, AAAI.