论文信息 - Model-based and Model-free Reinforcement Learning for Visual Servoing

Model-based and Model-free Reinforcement Learning for Visual Servoing

To address the difficulty of designing a controller for complex visual-servoing tasks, two learning-based uncalibrated approaches are introduced. The first method starts by building an estimated model for the visual-motor forward kinematic of the vision-robot system by a locally linear regression method. Afterwards, it uses a reinforcement learning method named Regularized Fitted Q-Iteration to find a controller (i.e. policy) for the system (model-based RL). The second method directly uses samples coming from the robot without building any intermediate model (model-free RL). The simulation results show that both methods perform comparably well despite not having any a priori knowledge about the robot.

[1] Lee E. Weiss,et al. Dynamic sensor-based control of robots with visual feedback , 1987, IEEE Journal on Robotics and Automation.

[2] Patrick Rives,et al. A new approach to visual servoing in robotics , 1992, IEEE Trans. Robotics Autom..

[3] Minoru Asada,et al. Versatile visual servoing without knowledge of true Jacobian , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).

[4] Peter I. Corke,et al. A tutorial on visual servo control , 1996, IEEE Trans. Robotics Autom..

[5] William J. Wilson,et al. Relative end-effector control using Cartesian position based visual servoing , 1996, IEEE Trans. Robotics Autom..

[6] Peter I. Corke,et al. A robotics toolbox for MATLAB , 1996, IEEE Robotics Autom. Mag..

[7] Olac Fuentes,et al. Experimental evaluation of uncalibrated visual servoing for precision manipulation , 1997, Proceedings of International Conference on Robotics and Automation.

[8] Rajeev Sharma,et al. The role of exploratory movement in visual servoing without calibration , 1998, Robotics Auton. Syst..

[9] Alexander J. Smola,et al. Learning with kernels , 1998 .

[10] E. Malis,et al. 2 1/2 D Visual Servoing , 1999 .

[11] Stefan Schaal,et al. Real-time robot learning with locally weighted statistical learning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[12] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[13] Ding-Xuan Zhou,et al. Capacity of reproducing kernel spaces in learning theory , 2003, IEEE Transactions on Information Theory.

[14] Shie Mannor,et al. The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[15] Harvey Lipkin,et al. Uncalibrated dynamic visual servoing , 2004, IEEE Transactions on Robotics and Automation.

[16] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[17] Domenico Prattichizzo,et al. EGT for multiple view geometry and visual servoing: robotics vision with pinhole and panoramic cameras , 2005, IEEE Robotics & Automation Magazine.

[18] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..

[19] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.

[20] D. Prattichizzo,et al. The Epipolar Geometry Toolbox : multiple view geometry and visual servoing for MATLAB , 2005 .

[21] François Chaumette,et al. Visual servo control. I. Basic approaches , 2006, IEEE Robotics & Automation Magazine.

[22] Seth Hutchinson,et al. Visual Servo Control Part I: Basic Approaches , 2006 .

[23] Larry Wasserman,et al. All of Nonparametric Statistics (Springer Texts in Statistics) , 2006 .

[24] Daniel Polani,et al. Least Squares SVM for Least Squares TD Learning , 2006, ECAI.

[25] François Chaumette,et al. Visual servo control. II. Advanced approaches [Tutorial] , 2007, IEEE Robotics & Automation Magazine.

[26] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.

[27] S. Hutchinson,et al. Visual servo control, Part II: Advanced approaches , 2007 .

[28] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .

[29] M. Loth,et al. Sparse Temporal Difference Learning Using LASSO , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[30] Lihong Li,et al. Analyzing feature generation for value-function approximation , 2007, ICML '07.

[31] Martin Jägersand,et al. Global Visual-Motor Estimation for Uncalibrated Visual Servoing , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[32] Csaba Szepesvári,et al. Empirical Bernstein stopping , 2008, ICML '08.

[33] Alborz Geramifard,et al. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.

[34] Csaba Szepesvari,et al. Regularized Fitted Q-iteration : Application to Bounded Resource Planning , 2009 .

[35] Shie Mannor,et al. Regularized Fitted Q-iteration: Application to Planning , 2008, EWRL.