Unsupervised Learning of Object Keypoints for Perception and Control

The study of object representations in computer vision has primarily focused on developing representations that are useful for image classification, object detection, or semantic segmentation as downstream tasks. In this work we aim to learn object representations that are useful for control and reinforcement learning (RL). To this end, we introduce Transporter, a neural network architecture for discovering concise geometric object representations in terms of keypoints or image-space coordinates. Our method learns from raw video frames in a fully unsupervised manner, by transporting learnt image features between video frames using a keypoint bottleneck. The discovered keypoints track objects and object parts across long time-horizons more accurately than recent similar methods. Furthermore, consistent long-term tracking enables two notable results in control domains -- (1) using the keypoint co-ordinates and corresponding image features as inputs enables highly sample-efficient reinforcement learning; (2) learning to explore by controlling keypoint locations drastically reduces the search space, enabling deep exploration (leading to states unreachable through random action exploration) without any extrinsic rewards.

[1]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[2]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[3]  Katherine D. Kinzler,et al.  Core knowledge. , 2007, Developmental science.

[4]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[8]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[9]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[10]  Trevor Darrell,et al.  Fully convolutional networks for semantic segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[13]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[14]  Joshua B. Tenenbaum,et al.  Understanding Visual Concepts with Continuation Learning , 2016, ArXiv.

[15]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[16]  Ronan Collobert,et al.  Learning to Refine Object Segments , 2016, ECCV.

[17]  Jiajun Wu,et al.  Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks , 2016, NIPS.

[18]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[19]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[21]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[22]  Adversarial Examples THERMOMETER ENCODING: ONE HOT WAY TO RESIST , 2017 .

[23]  Andrea Vedaldi,et al.  Unsupervised Learning of Object Landmarks by Factorized Spatial Embeddings , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[25]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[26]  Ankush Gupta,et al.  Unsupervised Learning of Object Landmarks through Conditional Image Generation , 2018, NeurIPS.

[27]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[28]  Marcin Andrychowicz,et al.  Parameter Space Noise for Exploration , 2017, ICLR.

[29]  David Budden,et al.  Distributed Prioritized Experience Replay , 2018, ICLR.

[30]  Pascal Poupart,et al.  Unsupervised Video Object Segmentation for Deep Reinforcement Learning , 2018, NeurIPS.

[31]  Qin Huang,et al.  Instance Embedding Transfer to Unsupervised Video Object Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Colin Raffel,et al.  Thermometer Encoding: One Hot Way To Resist Adversarial Examples , 2018, ICLR.

[33]  Yuting Zhang,et al.  Unsupervised Discovery of Object Landmarks as Structural Representations , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Yuval Tassa,et al.  DeepMind Control Suite , 2018, ArXiv.

[35]  Xu Ji,et al.  Invariant Information Distillation for Unsupervised Image Segmentation and Clustering , 2018, ArXiv.

[36]  Jonathan Tompson,et al.  Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning , 2018, NeurIPS.

[37]  Tejas D. Kulkarni,et al.  Learning to Control Visual Abstractions for Structured Exploration in Deep Reinforcement Learning , 2018 .

[38]  Andrew Zisserman,et al.  Self-supervised learning of a facial attribute embedding from video , 2018, BMVC.

[39]  Iasonas Kokkinos,et al.  Deforming Autoencoders: Unsupervised Disentangling of Shape and Appearance , 2018, ECCV.

[40]  Xu Ji,et al.  Invariant Information Clustering for Unsupervised Image Classification and Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Klaus Greff,et al.  Multi-Object Representation Learning with Iterative Variational Inference , 2019, ICML.

[42]  Anelia Angelova,et al.  Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Kenneth O. Stanley,et al.  Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.

[44]  Matthew Botvinick,et al.  MONet: Unsupervised Scene Decomposition and Representation , 2019, ArXiv.

[45]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[46]  Danna Zhou,et al.  d. , 1840, Microbial pathogenesis.

[47]  P. Alam ‘G’ , 2021, Composites Engineering: An A–Z Guide.

[48]  P. Alam ‘A’ , 2021, Composites Engineering: An A–Z Guide.