Self-directed Lifelong Learning for Robot Vision

Efforts towards robust visual scene understanding tend to rely heavily on manual annotations. When human labels are required, collecting a dataset large enough to train a successful robot vision system is almost certain to be prohibitively expensive. However, we argue that a robot with a vision sensor can learn powerful visual representations in a self-directed manner by relying on fundamental physical priors and bootstrapping techniques. For example, it has been shown that basic visual tracking systems can be used to automatically label short-range correspondences in video that allow one to train a system with capabilities analogous to object permanence in humans. An object permanence system can in turn be used to automatically label long-range correspondences, allowing one to train a system able to compare and contrast objects and scenes. In the end, the agent will develop a representation that encodes persistent material properties, state, lighting, etc. of various parts of a visual scene. Starting with a strong visual representation, the agent can then learn to solve traditional vision tasks such as class and/or instance recognition using only a sparse set of labels that can be found on the Internet or solicited at little cost from humans. More importantly, such a representation would also enable truly robust solutions to challenges in robotics such as global localization, loop closure detection, and object pose estimation.

[1]  Abhinav Gupta,et al.  Unsupervised Learning of Visual Representations Using Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Jitendra Malik,et al.  Learning to See by Moving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[4]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[5]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[6]  Dieter Fox,et al.  Self-Supervised Visual Descriptor Learning for Dense Correspondence , 2017, IEEE Robotics and Automation Letters.