Counterfactual Zero-Shot and Open-Set Visual Recognition

We present a novel counterfactual framework for both Zero-Shot Learning (ZSL) and Open-Set Recognition (OSR), whose common challenge is generalizing to the unseen-classes by only training on the seen-classes. Our idea stems from the observation that the generated samples for unseen-classes are often out of the true distribution, which causes severe recognition rate imbalance between the seen-class (high) and unseen-class (low). We show that the key reason is that the generation is not Counterfactual Faithful, and thus we propose a faithful one, whose generation is from the sample-specific counterfactual question: What would the sample look like, if we set its class attribute to a certain class, while keeping its sample attribute unchanged? Thanks to the faithfulness, we can apply the Consistency Rule to perform unseen/seen binary classification, by asking: Would its counterfactual still look like itself? If "yes", the sample is from a certain class, and "no" otherwise. Through extensive experiments on ZSL and OSR, we demonstrate that our framework effectively mitigates the seen/unseen imbalance and hence significantly improves the overall performance. Note that this framework is orthogonal to existing methods, thus, it can serve as a new baseline to evaluate how ZSL/OSR models generalize. Codes are available at https://github.com/yue-zhongqi/gcm-cf.

[1]  Hanwang Zhang,et al.  Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect , 2020, NeurIPS.

[2]  João Gama,et al.  A bounded neural network for open set recognition , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[3]  Takeshi Naemura,et al.  Classification-Reconstruction Learning for Open-Set Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Cordelia Schmid,et al.  Label-Embedding for Image Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Bernhard Scholkopf Causality for Machine Learning , 2019 .

[8]  Raja Giryes,et al.  Baby steps towards few-shot learning with multiple semantics , 2019, Pattern Recognit. Lett..

[9]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[11]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[12]  Weitang Liu,et al.  Energy-based Out-of-distribution Detection , 2020, NeurIPS.

[13]  Bernhard Schölkopf,et al.  Counterfactuals uncover the modular structure of deep generative models , 2018, ICLR.

[14]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[15]  Rahil Garnavi,et al.  Generative OpenMax for Multi-Class Open Set Classification , 2017, BMVC.

[16]  Jinhui Tang,et al.  Causal Intervention for Weakly-Supervised Semantic Segmentation , 2020, NeurIPS.

[17]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Xiaobo Jin,et al.  Attentive Region Embedding Network for Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Bernhard Schölkopf,et al.  Learning Independent Causal Mechanisms , 2017, ICML.

[20]  Fahad Shahbaz Khan,et al.  Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification , 2020, ECCV.

[21]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[22]  Weng-Keen Wong,et al.  Open Set Learning with Counterfactual Images , 2018, ECCV.

[23]  A. Blumberg BASIC TOPOLOGY , 2002 .

[24]  Dima Damen,et al.  Multi-Modal Domain Adaptation for Fine-Grained Action Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[25]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[26]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[27]  Mélanie Frappier,et al.  The Book of Why: The New Science of Cause and Effect , 2018, Science.

[28]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Stefan Bauer,et al.  Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness , 2018, ICML.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Bernt Schiele,et al.  F-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[34]  Hanwang Zhang,et al.  Deconfounded Image Captioning: A Causal Retrospect , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[36]  Gal Chechik,et al.  A causal view of compositional zero-shot recognition , 2020, NeurIPS.

[37]  Philip S. Yu,et al.  Generative Dual Adversarial Network for Generalized Zero-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Wei-Lun Chao,et al.  An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild , 2016, ECCV.

[39]  Matti Lassas,et al.  Globally Injective ReLU Networks , 2020, ArXiv.

[40]  Bernhard Schölkopf,et al.  Group invariance principles for causal generative models , 2017, AISTATS.

[41]  Shiguang Shan,et al.  Transferable Contrastive Network for Generalized Zero-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Zi Huang,et al.  Leveraging the Invariant Side of Generative Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Terrance E. Boult,et al.  Towards Open Set Deep Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Soma Biswas,et al.  Generative Model with Semantic Embedding and Integrated Classifier for Generalized Zero-Shot Learning , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[45]  Mohammad Norouzi,et al.  Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , 2019, ICLR.

[46]  Wei Liu,et al.  Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Ling Shao,et al.  Out-Of-Distribution Detection for Generalized Zero-Shot Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Bernt Schiele,et al.  Feature Generating Networks for Zero-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Hanwang Zhang,et al.  Visual Commonsense R-CNN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Terrance E. Boult,et al.  Probability Models for Open Set Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Bernhard Schölkopf,et al.  A theory of independent mechanisms for extrapolation in generative models , 2020, AAAI.

[52]  Pramod K. Varshney,et al.  Anomalous Instance Detection in Deep Learning: A Survey , 2020, ArXiv.

[53]  Trevor Darrell,et al.  Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Anderson Rocha,et al.  Toward Open Set Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[57]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[58]  Hanwang Zhang,et al.  Interventional Few-Shot Learning , 2020, NeurIPS.

[59]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[60]  Bernt Schiele,et al.  Latent Embeddings for Zero-Shot Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Nanning Zheng,et al.  A Boundary Based Out-of-Distribution Classifier for Generalized Zero-Shot Learning , 2020, ECCV.

[62]  J. Pearl,et al.  Causal Inference in Statistics: A Primer , 2016 .

[63]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[64]  Vishal M. Patel,et al.  C2AE: Class Conditioned Auto-Encoder for Open-Set Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Kosuke Imai,et al.  Experimental designs for identifying causal mechanisms , 2013 .

[66]  David Pfau,et al.  Towards a Definition of Disentangled Representations , 2018, ArXiv.

[67]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[68]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[69]  R. Weale Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .

[70]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[71]  Chunyan Miao,et al.  Distilling Causal Effect of Data in Class-Incremental Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.