Learning Visual Servoing with Deep Features and Fitted Q-Iteration

Visual servoing involves choosing actions that move a robot in response to observations from a camera, in order to reach a goal configuration in the world. Standard visual servoing approaches typically rely on manually designed features and analytical dynamics models, which limits their generalization capability and often requires extensive application-specific feature and model engineering. In this work, we study how learned visual features, learned predictive dynamics models, and reinforcement learning can be combined to learn visual servoing mechanisms. We focus on target following, with the goal of designing algorithms that can learn a visual servo using low amounts of data of the target in question, to enable quick adaptation to new targets. Our approach is based on servoing the camera in the space of learned visual features, rather than image pixels or manually-designed keypoints. We demonstrate that standard deep features, in our case taken from a model trained for object classification, can be used together with a bilinear predictive model to learn an effective visual servo that is robust to visual variation, changes in viewing angle and appearance, and occlusions. A key component of our approach is to use a sample-efficient fitted Q-iteration algorithm to learn which features are best suited for the task at hand. We show that we can learn an effective visual servo on a complex synthetic car following benchmark using just 20 training trajectory samples for reinforcement learning. We demonstrate substantial improvement over a conventional approach based on image pixels or hand-designed keypoints, and we show an improvement in sample-efficiency of more than two orders of magnitude over standard model-free deep reinforcement learning algorithms. Videos are available at this http URL .

[1]  Lee E. Weiss,et al.  Dynamic sensor-based control of robots with visual feedback , 1987, IEEE Journal on Robotics and Automation.

[2]  Patrick Rives,et al.  A new approach to visual servoing in robotics , 1992, IEEE Trans. Robotics Autom..

[3]  Peter Corke,et al.  VISUAL CONTROL OF ROBOT MANIPULATORS – A REVIEW , 1993 .

[4]  Minoru Asada,et al.  Versatile visual servoing without knowledge of true Jacobian , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).

[5]  Peter K. Allen,et al.  Active, uncalibrated visual servoing , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[6]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[7]  Peter I. Corke,et al.  A tutorial on visual servo control , 1996, IEEE Trans. Robotics Autom..

[8]  William J. Wilson,et al.  Relative end-effector control using Cartesian position based visual servoing , 1996, IEEE Trans. Robotics Autom..

[9]  Olac Fuentes,et al.  Experimental evaluation of uncalibrated visual servoing for precision manipulation , 1997, Proceedings of International Conference on Robotics and Automation.

[10]  E. Malis,et al.  2 1/2 D Visual Servoing , 1999 .

[11]  Avinash C. Kak,et al.  Vision for Mobile Robot Navigation: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Danica Kragic,et al.  Survey on Visual Servoing for Manipulation , 2002 .

[13]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[14]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[15]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[16]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[17]  Warren E. Dixon,et al.  Homography-based visual servo tracking control of a wheeled mobile robot , 2006, IEEE Transactions on Robotics.

[18]  François Chaumette,et al.  Visual servo control. I. Basic approaches , 2006, IEEE Robotics & Automation Magazine.

[19]  Christophe Collewet,et al.  Visual servoing set free from image processing , 2008, 2008 IEEE International Conference on Robotics and Automation.

[20]  Shie Mannor,et al.  Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems , 2009, 2009 American Control Conference.

[21]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[22]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Christophe Collewet,et al.  Photometric Visual Servoing , 2011, IEEE Transactions on Robotics.

[24]  Ethan Rublee,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[25]  Martin A. Riedmiller,et al.  Autonomous reinforcement learning on raw visual input data in a real world application , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[26]  Rs Roel Pieters,et al.  Visual Servo Control , 2012 .

[27]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[28]  Martin A. Riedmiller,et al.  Acquiring visual servoing reaching and grasping skills using neural reinforcement learning , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[29]  Éric Marchand,et al.  Photometric visual servoing for omnidirectional cameras , 2013, Auton. Robots.

[30]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[31]  Vijay Kumar,et al.  Vision-based control of a quadrotor for perching on lines , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[33]  Richard M. Murray,et al.  Bootstrapping bilinear models of Simple Vehicles , 2015, Int. J. Robotics Res..

[34]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[35]  David Calvert,et al.  Self-Learning Visual Servoing of Robot Manipulator Using Explanation-Based Fuzzy Neural Networks and Q-Learning , 2015, J. Intell. Robotic Syst..

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[38]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[39]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[40]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[41]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[42]  Jiajun Wu,et al.  Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks , 2016, NIPS.

[43]  Martial Hebert,et al.  An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders , 2016, ECCV.

[44]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[45]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[46]  Luc Van Gool,et al.  Dynamic Filter Networks , 2016, NIPS.

[47]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[48]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[49]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.

[50]  Omer Levy,et al.  Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .