Generalization and Robustness Implications in Object-Centric Learning

The idea behind object-centric representation learning is that natural scenes can better be modeled as compositions of objects and their relations as opposed to distributed representations. This inductive bias can be injected into neural networks to potentially improve systematic generalization and learning efficiency of downstream tasks in scenes with multiple objects. In this paper, we train state-of-the-art unsupervised models on five common multi-object datasets and evaluate segmentation accuracy and downstream object property prediction. In addition, we study systematic generalization and robustness by investigating the settings where either single objects are out-of-distribution—e.g., having unseen colors, textures, and shapes—or global properties of the scene are altered—e.g., by occlusions, cropping, or increasing the number of objects. From our experimental study, we find object-centric representations to be generally useful for downstream tasks and robust to shifts in the data distribution, especially if shifts affect single objects.

[1]  Elizabeth S. Spelke,et al.  Principles of Object Perception , 1990, Cogn. Sci..

[2]  Brian Cantwell Smith,et al.  On the origin of objects , 1997, Trends in Cognitive Sciences.

[3]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[4]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Edward Vul,et al.  Pure Reasoning in 12-Month-Old Infants as Probabilistic Inference , 2011, Science.

[6]  Nicolas Le Roux,et al.  Learning a Generative Model of Images by Factoring Appearance and Shape , 2011, Neural Computation.

[7]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[8]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[10]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[11]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[12]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[13]  M. Cugmas,et al.  On comparing partitions , 2015 .

[14]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[15]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[16]  Harri Valpola,et al.  Tagger: Deep Unsupervised Perceptual Grouping , 2016, NIPS.

[17]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[18]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[19]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jürgen Schmidhuber,et al.  Neural Expectation Maximization , 2017, NIPS.

[21]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[22]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[23]  Christopher P. Burgess,et al.  The Multi-Entity Variational Autoencoder , 2018 .

[24]  Jürgen Schmidhuber,et al.  Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions , 2018, ICLR.

[25]  Roger B. Grosse,et al.  Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[26]  Yee Whye Teh,et al.  Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects , 2018, NeurIPS.

[27]  Michael C. Mozer,et al.  Learning Deep Disentangled Embeddings with the F-Statistic Loss , 2018, NeurIPS.

[28]  Christopher K. I. Williams,et al.  A Framework for the Quantitative Evaluation of Disentangled Representations , 2018, ICLR.

[29]  E. J. Green,et al.  A Theory of Perceptual Objects , 2018, Philosophy and Phenomenological Research.

[30]  Koray Kavukcuoglu,et al.  Neural scene representation and rendering , 2018, Science.

[31]  Andrea Vedaldi,et al.  ShapeStacks: Learning Vision-Based Physical Intuition for Generalised Object Stacking , 2018, ECCV.

[32]  Gunnar Rätsch,et al.  Competitive Training of Mixtures of Independent Deep Generative Models , 2018 .

[33]  Ludovic Denoyer,et al.  Unsupervised Object Segmentation by Redrawing , 2019, NeurIPS.

[34]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[35]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[36]  Sjoerd van Steenkiste,et al.  Are Disentangled Representations Helpful for Abstract Visual Reasoning? , 2019, NeurIPS.

[37]  Ole Winther,et al.  LAVAE: Disentangling Location and Appearance , 2019, ArXiv.

[38]  Kristian Kersting,et al.  Faster Attend-Infer-Repeat with Tractable Probabilistic Models , 2019, ICML.

[39]  Klaus Greff,et al.  Multi-Object Representation Learning with Iterative Variational Inference , 2019, ICML.

[40]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[41]  Stefan Bauer,et al.  On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset , 2019, NeurIPS.

[42]  Dana H. Brooks,et al.  Structured Disentangled Representations , 2018, AISTATS.

[43]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[44]  Stefan Bauer,et al.  On the Fairness of Disentangled Representations , 2019, NeurIPS.

[45]  Joelle Pineau,et al.  Spatially Invariant Unsupervised Object Detection with Convolutional Neural Networks , 2019, AAAI.

[46]  Alexander Lerchner,et al.  Spatial Broadcast Decoder: A Simple Architecture for Learning Disentangled Representations in VAEs , 2019, ArXiv.

[47]  Bin Li,et al.  Generative Modeling of Infinite Occluded Objects for Compositional Scene Representation , 2019, ICML.

[48]  K. Kersting,et al.  Structured Object-Aware Physics Prediction for Video Modeling and Planning , 2019, ICLR.

[49]  A. P. Sarath Chandar,et al.  Slot Contrastive Networks: A Contrastive Approach for Representing Objects , 2020, ArXiv.

[50]  Elise van der Pol,et al.  Contrastive Learning of Structured World Models , 2019, ICLR.

[51]  Ingmar Posner,et al.  Reconstruction Bottlenecks in Object-Centric Generative Models , 2020, ArXiv.

[52]  Austin R. Benson,et al.  Better Set Representations For Relational Reasoning , 2020, NeurIPS.

[53]  Thomas Kipf,et al.  Object-Centric Learning with Slot Attention , 2020, NeurIPS.

[54]  Sungjin Ahn,et al.  SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition , 2020, ICLR.

[55]  Peter V. Gehler,et al.  Towards causal generative scene models via competition of experts , 2020, ArXiv.

[56]  Sjoerd van Steenkiste,et al.  Investigating object compositionality in Generative Adversarial Networks , 2018, Neural Networks.

[57]  Sungjin Ahn,et al.  Improving Generative Imagination in Object-Centric World Models , 2020, ICML.

[58]  Chuang Gan,et al.  CLEVRER: CoLlision Events for Video REpresentation and Reasoning , 2020, ICLR.

[59]  Ingmar Posner,et al.  GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations , 2019, ICLR.

[60]  Chang Chen,et al.  Learning to Infer 3D Object Models from Images , 2020, ArXiv.

[61]  Jure Leskovec,et al.  Learning to Simulate Complex Physics with Graph Networks , 2020, ICML.

[62]  Joelle Pineau,et al.  Exploiting Spatial Invariance for Scalable Unsupervised Object Tracking , 2019, AAAI.

[63]  Alexey Dosovitskiy,et al.  Learning Object-Centric Video Models by Contrasting Sets , 2020, ArXiv.

[64]  Klaus Greff,et al.  On the Binding Problem in Artificial Neural Networks , 2020, ArXiv.

[65]  Stefano Soatto,et al.  Learning to Manipulate Individual Objects in an Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Alexander S. Ecker,et al.  Unmasking the Inductive Biases of Unsupervised Object Representations for Video Sequences , 2020, ArXiv.

[67]  SCALOR: Generative World Models with Scalable Object Representations , 2019, ICLR.

[68]  Francesco Locatello,et al.  A Sober Look at the Unsupervised Learning of Disentangled Representations and their Evaluation , 2020, J. Mach. Learn. Res..

[69]  B. Schölkopf,et al.  On Disentangled Representations Learned from Correlated Data , 2020, ICML.

[70]  Michael Tschannen,et al.  Representation learning from videos in-the-wild: An object-centric approach , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[71]  B. Schölkopf,et al.  On the Transfer of Disentangled Representations in Realistic Settings , 2020, ICLR.

[72]  Ingmar Posner,et al.  GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement , 2021, NeurIPS.

[73]  Sergey Levine,et al.  Recurrent Independent Mechanisms , 2019, ICLR.

[74]  Yoshua Bengio,et al.  Towards Causal Representation Learning , 2021, ArXiv.

[75]  Fei Deng,et al.  Generative Scene Graph Networks , 2021, ICLR.

[76]  John Essington,et al.  How we learn: why brains learn better than any machine … for now , 2021, Educational Review.

[77]  Gaurav Malhotra,et al.  The role of Disentanglement in Generalisation , 2021, ICLR.