Embodied Learning for Lifelong Visual Perception

We study lifelong visual perception in an embodied setup, where we develop new models and compare various agents that navigate in buildings and occasionally request annotations which, in turn, are used to refine their visual perception capabilities. The purpose of the agents is to recognize objects and other semantic classes in the whole building at the end of a process that combines exploration and active visual learning. As we study this task in a lifelong learning context, the agents should use knowledge gained in earlier visited environments in order to guide their exploration and active learning strategy in successively visited buildings. We use the semantic segmentation performance as a proxy for general visual perception and study this novel task for several exploration and annotation methods, ranging from frontier exploration baselines which use heuristic active learning, to a fully learnable approach. For the latter, we introduce a deep reinforcement learning (RL) based agent which jointly learns both navigation and active learning. A point goal navigation formulation, coupled with a global planner which supplies goals, is integrated into the RL model in order to provide further incentives for systematic exploration of novel scenes. By performing extensive experiments on the Matterport3D dataset, we show how the proposed agents can utilize knowledge from previously explored scenes when exploring new ones, e.g. through less granular exploration and less frequent requests for annotations. The results also suggest that a learning-based agent is able to use its prior visual knowledge more effectively than heuristic alternatives.

[1]  Jianxiong Xiao,et al.  Robot In a Room: Toward Perfect Object Recognition in Closed Environments , 2015, ArXiv.

[2]  Samuel Rota Bulo,et al.  Modeling the Background for Incremental Learning in Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kris M. Kitani,et al.  Importance of Self-Consistency in Active Learning for Semantic Segmentation , 2020, BMVC.

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  Xinlei Chen,et al.  Embodied Amodal Recognition: Learning to Move to Perceive Objects , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[7]  Jitendra Malik,et al.  Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Carsten Rother,et al.  CEREALS - Cost-Effective REgion-based Active Learning for Semantic Segmentation , 2018, BMVC.

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Yarin Gal,et al.  BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning , 2019, NeurIPS.

[13]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[14]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Brian Yamauchi,et al.  Frontier-based exploration using multiple robots , 1998, AGENTS '98.

[16]  Pedro H. O. Pinheiro,et al.  Reinforced active learning for image segmentation , 2020, ICLR.

[17]  Abhinav Gupta,et al.  Semantic Curiosity for Active Visual Learning , 2020, ECCV.

[18]  Yuan Li,et al.  Learning how to Active Learn: A Deep Reinforcement Learning Approach , 2017, EMNLP.

[19]  Silvio Savarese,et al.  Learning to Navigate Using Mid-Level Visual Priors , 2019, CoRL.

[20]  Cristian Sminchisescu,et al.  Embodied Visual Active Learning for Semantic Segmentation , 2020, AAAI.

[21]  Michael I. Jordan,et al.  RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[22]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[23]  Ruslan Salakhutdinov,et al.  Learning to Explore using Active Neural SLAM , 2020, ICLR.

[24]  Shiguo Lian,et al.  A Unified Framework for Mutual Improvement of SLAM and Semantic Segmentation , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[25]  Trevor Darrell,et al.  Variational Adversarial Active Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Santhosh K. Ramakrishnan,et al.  An Exploration of Embodied Visual Exploration , 2021, Int. J. Comput. Vis..

[27]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[28]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Pascal Fua,et al.  Discovering General-Purpose Active Learning Strategies , 2018, ArXiv.

[30]  Julien Valentin,et al.  ViewAL: Active Learning With Viewpoint Entropy for Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  C. V. Jawahar,et al.  Region-based active learning for efficient labeling in semantic segmentation , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[32]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[33]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[34]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[35]  Ziqi Zhang,et al.  Detect-SLAM: Making Object Detection and SLAM Mutually Beneficial , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[36]  Ronald Kemker,et al.  Measuring Catastrophic Forgetting in Neural Networks , 2017, AAAI.

[37]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[38]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[39]  Frédéric Precioso,et al.  Adversarial Active Learning for Deep Networks: a Margin Based Approach , 2018, ArXiv.

[40]  Jana Kosecka,et al.  Self-supervisory Signals for Object Discovery and Detection , 2018, ArXiv.

[41]  Aleksander Madry,et al.  Exploring the Landscape of Spatial Robustness , 2017, ICML.

[42]  Junfeng Yang,et al.  Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems , 2017, ArXiv.

[43]  Jana Kosecka,et al.  A dataset for developing and benchmarking active vision , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Matthieu Cord,et al.  PLOP: Learning without Forgetting for Continual Semantic Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[46]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Pietro Zanuttigh,et al.  Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).