Model-based and Model-free Reinforcement Learning for Visual Servoing

To address the difficulty of designing a controller for complex visual-servoing tasks, two learning-based uncalibrated approaches are introduced. The first method starts by building an estimated model for the visual-motor forward kinematic of the vision-robot system by a locally linear regression method. Afterwards, it uses a reinforcement learning method named Regularized Fitted Q-Iteration to find a controller (i.e. policy) for the system (model-based RL). The second method directly uses samples coming from the robot without building any intermediate model (model-free RL). The simulation results show that both methods perform comparably well despite not having any a priori knowledge about the robot.

[1]  Lee E. Weiss,et al.  Dynamic sensor-based control of robots with visual feedback , 1987, IEEE Journal on Robotics and Automation.

[2]  Patrick Rives,et al.  A new approach to visual servoing in robotics , 1992, IEEE Trans. Robotics Autom..

[3]  Minoru Asada,et al.  Versatile visual servoing without knowledge of true Jacobian , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).

[4]  Peter I. Corke,et al.  A tutorial on visual servo control , 1996, IEEE Trans. Robotics Autom..

[5]  William J. Wilson,et al.  Relative end-effector control using Cartesian position based visual servoing , 1996, IEEE Trans. Robotics Autom..

[6]  Peter I. Corke,et al.  A robotics toolbox for MATLAB , 1996, IEEE Robotics Autom. Mag..

[7]  Olac Fuentes,et al.  Experimental evaluation of uncalibrated visual servoing for precision manipulation , 1997, Proceedings of International Conference on Robotics and Automation.

[8]  Rajeev Sharma,et al.  The role of exploratory movement in visual servoing without calibration , 1998, Robotics Auton. Syst..

[9]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[10]  E. Malis,et al.  2 1/2 D Visual Servoing , 1999 .

[11]  Stefan Schaal,et al.  Real-time robot learning with locally weighted statistical learning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[12]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[13]  Ding-Xuan Zhou,et al.  Capacity of reproducing kernel spaces in learning theory , 2003, IEEE Transactions on Information Theory.

[14]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[15]  Harvey Lipkin,et al.  Uncalibrated dynamic visual servoing , 2004, IEEE Transactions on Robotics and Automation.

[16]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[17]  Domenico Prattichizzo,et al.  EGT for multiple view geometry and visual servoing: robotics vision with pinhole and panoramic cameras , 2005, IEEE Robotics & Automation Magazine.

[18]  Shie Mannor,et al.  Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..

[19]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[20]  D. Prattichizzo,et al.  The Epipolar Geometry Toolbox : multiple view geometry and visual servoing for MATLAB , 2005 .

[21]  François Chaumette,et al.  Visual servo control. I. Basic approaches , 2006, IEEE Robotics & Automation Magazine.

[22]  Seth Hutchinson,et al.  Visual Servo Control Part I: Basic Approaches , 2006 .

[23]  Larry Wasserman,et al.  All of Nonparametric Statistics (Springer Texts in Statistics) , 2006 .

[24]  Daniel Polani,et al.  Least Squares SVM for Least Squares TD Learning , 2006, ECAI.

[25]  François Chaumette,et al.  Visual servo control. II. Advanced approaches [Tutorial] , 2007, IEEE Robotics & Automation Magazine.

[26]  Csaba Szepesvári,et al.  Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.

[27]  S. Hutchinson,et al.  Visual servo control, Part II: Advanced approaches , 2007 .

[28]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[29]  M. Loth,et al.  Sparse Temporal Difference Learning Using LASSO , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[30]  Lihong Li,et al.  Analyzing feature generation for value-function approximation , 2007, ICML '07.

[31]  Martin Jägersand,et al.  Global Visual-Motor Estimation for Uncalibrated Visual Servoing , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[32]  Csaba Szepesvári,et al.  Empirical Bernstein stopping , 2008, ICML '08.

[33]  Alborz Geramifard,et al.  Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.

[34]  Csaba Szepesvari,et al.  Regularized Fitted Q-iteration : Application to Bounded Resource Planning , 2009 .

[35]  Shie Mannor,et al.  Regularized Fitted Q-iteration: Application to Planning , 2008, EWRL.