Critic Guided Segmentation of Rewarding Objects in First-Person Views

This work discusses a learning approach to mask rewarding objects in images using sparse reward signals from an imitation learning dataset. For that, we train an Hourglass network using only feedback from a critic model. The Hourglass network learns to produce a mask to decrease the critic’s score of a high score image and increase the critic’s score of a low score image by swapping the masked areas between these two images. We trained the model on an imitation learning dataset from the NeurIPS 2020 MineRL Competition Track, where our model learned to mask rewarding objects in a complex interactive 3D environment with a sparse reward signal. This approach was part of the 1st place winning solution in this competition. Video demonstration and code: https://rebrand.ly/critic-guided-segmentation

[1]  M. Schilling,et al.  An Approach to Hierarchical Deep Reinforcement Learning for a Decentralized Walking Control Architecture , 2018, Biologically Inspired Cognitive Architectures 2018.

[2]  Timo Korthals,et al.  Biologically-Inspired Deep Reinforcement Learning of Modular Control for a Six-Legged Robot , 2019 .

[3]  Animesh Garg,et al.  Solving Physics Puzzles by Reasoning about Paths , 2020, ArXiv.

[4]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[5]  Sabine U. König,et al.  Embodied cognition , 2018, 2018 6th International Conference on Brain-Computer Interface (BCI).

[6]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[7]  Arvind Satyanarayan,et al.  The Building Blocks of Interpretability , 2018 .

[8]  Deborah Silver,et al.  Feature Visualization , 1994, Scientific Visualization.

[9]  Timo Korthals,et al.  Learn to Move Through a Combination of Policy Gradient Algorithms: DDPG, D4PG, and TD3 , 2020, LOD.

[10]  David Gunning,et al.  DARPA's explainable artificial intelligence (XAI) program , 2019, IUI.

[11]  Helge J. Ritter,et al.  Modularization of End-to-End Learning: Case Study in Arcade Games , 2019, ArXiv.

[12]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[13]  Peter König,et al.  The World as an External Memory: The Price of Saccades in a Sensorimotor Task , 2018, Front. Behav. Neurosci..

[14]  Jacob Hilton,et al.  Understanding RL vision , 2020 .

[15]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[16]  William H. Guss,et al.  Towards robust and domain agnostic reinforcement learning competitions , 2021, NeurIPS.

[17]  Timo Korthals,et al.  Using Tactile Sensing to Improve the Sample Efficiency and Performance of Deep Deterministic Policy Gradients for Simulated In-Hand Manipulation Tasks , 2021, Frontiers in Robotics and AI.

[18]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[19]  Jonathan Dodge,et al.  Visualizing and Understanding Atari Agents , 2017, ICML.