Efforts towards robust visual scene understanding tend to rely heavily on manual annotations. When human labels are required, collecting a dataset large enough to train a successful robot vision system is almost certain to be prohibitively expensive. However, we argue that a robot with a vision sensor can learn powerful visual representations in a self-directed manner by relying on fundamental physical priors and bootstrapping techniques. For example, it has been shown that basic visual tracking systems can be used to automatically label short-range correspondences in video that allow one to train a system with capabilities analogous to object permanence in humans. An object permanence system can in turn be used to automatically label long-range correspondences, allowing one to train a system able to compare and contrast objects and scenes. In the end, the agent will develop a representation that encodes persistent material properties, state, lighting, etc. of various parts of a visual scene. Starting with a strong visual representation, the agent can then learn to solve traditional vision tasks such as class and/or instance recognition using only a sparse set of labels that can be found on the Internet or solicited at little cost from humans. More importantly, such a representation would also enable truly robust solutions to challenges in robotics such as global localization, loop closure detection, and object pose estimation.
Abhinav Gupta,et al.
Unsupervised Learning of Visual Representations Using Videos
2015 IEEE International Conference on Computer Vision (ICCV).
Jitendra Malik,et al.
Learning to See by Moving
2015 IEEE International Conference on Computer Vision (ICCV).
François Laviolette,et al.
Domain-Adversarial Training of Neural Networks
J. Mach. Learn. Res..
Alex Graves,et al.
Playing Atari with Deep Reinforcement Learning
Sergey Levine,et al.
End-to-End Training of Deep Visuomotor Policies
J. Mach. Learn. Res..
Dieter Fox,et al.
Self-Supervised Visual Descriptor Learning for Dense Correspondence
IEEE Robotics and Automation Letters.