Interactive Disentanglement: Learning Concepts by Interacting with their Prototype Representations

Learning visual concepts from raw images without strong supervision is a challenging task. In this work, we show the advantages of prototype representations for understanding and revising the latent space of neural concept learners. For this purpose, we introduce interactive Concept Swapping Networks (iCSNs), a novel framework for learning conceptgrounded representations via weak supervision and implicit prototype representations. iCSNs learn to bind conceptual information to specific prototype slots by swapping the latent representations of paired images. This semantically grounded and discrete latent space facilitates human understanding and human-machine interaction. We support this claim by conducting experiments on our novel data set “Elementary Concept Reasoning” (ECR), focusing on visual concepts shared by geometric objects.

[1]  Anirban Mukhopadhyay,et al.  Adversarial Continual Learning for Multi-Domain Hippocampal Segmentation , 2021, DART/FAIR@MICCAI.

[2]  Olivier Bachem,et al.  Recent Advances in Autoencoder-Based Representation Learning , 2018, ArXiv.

[3]  Measuring Disentanglement: A Review of Metrics , 2020, ArXiv.

[4]  Marco Gori,et al.  Logic Explained Networks , 2021, ArXiv.

[5]  Klaus Greff,et al.  Multi-Object Representation Learning with Iterative Variational Inference , 2019, ICML.

[6]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[8]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[9]  E. Archer,et al.  THE PSYCHOLOGICAL NATURE OF CONCEPTS , 1966 .

[10]  Been Kim,et al.  Concept Bottleneck Models , 2020, ICML.

[11]  Cynthia Rudin,et al.  Deep Learning for Case-based Reasoning through Prototypes: A Neural Network that Explains its Predictions , 2017, AAAI.

[12]  Tassilo Klein,et al.  Multimodal Prototypical Networks for Few-shot Learning , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13]  Cynthia Rudin,et al.  This Looks Like That: Deep Learning for Interpretable Image Recognition , 2018 .

[14]  Saining Xie,et al.  An Empirical Study of Training Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Chuang Gan,et al.  Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding , 2018, NeurIPS.

[16]  B. Schölkopf,et al.  Generalization and similarity in exemplar models of categorization: Insights from machine learning , 2008, Psychonomic bulletin & review.

[17]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[18]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[19]  Zhitang Chen,et al.  CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Sungjin Ahn,et al.  SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition , 2020, ICLR.

[22]  PAUL IBBOTSON,et al.  Prototype constructions in early language acquisition , 2009, Language and Cognition.

[23]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[24]  Tommi S. Jaakkola,et al.  Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[25]  Joshua B. Tenenbaum,et al.  Chapter Six – Categorization as Causal Explanation: Discounting and Augmenting in a Bayesian Framework , 2013 .

[26]  Alex Lascarides,et al.  Interpretable Latent Spaces for Learning from Demonstration , 2018, CoRL.

[27]  Demis Hassabis,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[28]  Andrew Slavin Ross,et al.  Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations , 2017, IJCAI.

[29]  Matthew Botvinick,et al.  MONet: Unsupervised Scene Decomposition and Representation , 2019, ArXiv.

[30]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Joseph D. Janizek,et al.  AI for radiographic COVID-19 detection selects shortcuts over signal , 2020, Nature Machine Intelligence.

[32]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[33]  K. Kersting,et al.  Making deep neural networks right for the right scientific reasons by interacting with their explanations , 2020, Nature Machine Intelligence.

[34]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[35]  Maurizio Filippone,et al.  An Identifiable Double VAE For Disentangled Representations , 2021, ICML.

[36]  Nicolas Courty,et al.  Contextual Semantic Interpretability , 2020, ACCV.

[37]  Dacheng Tao,et al.  Where and What? Examining Interpretable Disentangled Representations , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[40]  Andreas Geiger,et al.  GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[42]  Christin Wirth,et al.  Concepts Where Cognitive Science Went Wrong , 2016 .

[43]  Kristian Kersting,et al.  Explanatory Interactive Machine Learning , 2019, AIES.

[44]  Caitlin R. Bowman,et al.  Tracking prototype and exemplar representations in the brain across learning , 2020, eLife.

[45]  Mete Ozay,et al.  Prototype Guided Federated Learning of Visual Feature Representations , 2021, ArXiv.

[46]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[47]  Sebastian Nowozin,et al.  Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations , 2017, AAAI.

[48]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[49]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[50]  Yong-Liang Yang,et al.  BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images , 2020, NeurIPS.

[51]  Ben Poole,et al.  Weakly-Supervised Disentanglement Without Compromises , 2020, ICML.

[52]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Chun-Liang Li,et al.  On Completeness-aware Concept-Based Explanations in Deep Neural Networks , 2020, NeurIPS.

[54]  Haruo Hosoya,et al.  Group-based Learning of Disentangled Representations with Generalizability for Novel Contents , 2019, IJCAI.

[55]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[56]  Mihir Prabhudesai,et al.  Disentangling 3D Prototypical Networks For Few-Shot Concept Learning , 2021, ICLR.

[57]  Richard Bowden,et al.  Gated Variational AutoEncoders: Incorporating Weak Supervision to Encourage Disentanglement , 2019, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[58]  Yining Chen,et al.  Weakly Supervised Disentanglement with Guarantees , 2020, ICLR.

[59]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[60]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[61]  Zhiwen Yu,et al.  Behavior regularized prototypical networks for semi-supervised few-shot image classification , 2021, Pattern Recognit..

[62]  Anthony V. Robins,et al.  Catastrophic forgetting in neural networks: the role of rehearsal mechanisms , 1993, Proceedings 1993 The First New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems.

[63]  Ingmar Posner,et al.  GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations , 2019, ICLR.

[64]  Hongxia Jin,et al.  Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[65]  Pedro Saleiro,et al.  Weakly Supervised Multi-task Learning for Concept-based Explainability , 2021, ArXiv.

[66]  Kristian Kersting,et al.  Right for the Right Concept: Revising Neuro-Symbolic Concepts by Interacting with their Explanations , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  James Zou,et al.  Towards Automatic Concept-based Explanations , 2019, NeurIPS.

[68]  Antonio Lieto,et al.  Prototypes Vs Exemplars in Concept Representation , 2012, KEOD.

[69]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[70]  Elena K. Festa,et al.  Prototype learning and dissociable categorization systems in Alzheimer's disease , 2013, Neuropsychologia.

[71]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[72]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[73]  Lu Wang,et al.  SCGAN: Disentangled Representation Learning by Adding Similarity Constraint on Generative Adversarial Nets , 2019, IEEE Access.

[74]  Elena L. Glassman,et al.  Evaluating the Interpretability of Generative Models by Interactive Reconstruction , 2021, CHI.

[75]  J. D. Smith,et al.  Prototypes, exemplars, and the natural history of categorization , 2014, Psychonomic bulletin & review.

[76]  R. Wyer,et al.  The Role of Prototypes in the Mental Representation of Temporally Related Events , 2002, Cognitive Psychology.

[77]  Georg Heigold,et al.  Object-Centric Learning with Slot Attention , 2020, NeurIPS.

[78]  In S. Jeon,et al.  IB-GAN: Disengangled Representation Learning with Information Bottleneck Generative Adversarial Networks , 2020 .