论文信息 - Quadruped robot obstacle negotiation via reinforcement learning

Quadruped robot obstacle negotiation via reinforcement learning

Legged robots can, in principle, traverse a large variety of obstacles and terrains. In this paper, we describe a successful application of reinforcement learning to the problem of negotiating obstacles with a quadruped robot. Our algorithm is based on a two-level hierarchical decomposition of the task, in which the high-level controller selects the sequence of foot-placement positions, and the low-level controller generates the continuous motions to move each foot to the specified positions. The high-level controller uses an estimate of the value function to guide its search; this estimate is learned partially from supervised data. The low-level controller is obtained via policy search. We demonstrate that our robot can successfully climb over a variety of obstacles which were not seen at training time

[1] Marc H. Raibert,et al. Legged Robots That Balance , 1986, IEEE Expert.

[2] Francis L. Merat,et al. Introduction to robotics: Mechanics and control , 1987, IEEE J. Robotics Autom..

[3] John J. Craig,et al. Introduction to robotics - mechanics and control (2. ed.) , 1989 .

[4] Oussama Khatib,et al. Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.

[5] Giuseppe Oriolo,et al. Robot Obstacle Avoidance Using Vortex Fields , 1991 .

[6] S. Hirose,et al. Machine that can walk and climb on floors, walls and ceilings , 1991, Fifth International Conference on Advanced Robotics 'Robots in Unstructured Environments.

[7] Oussama Khatib,et al. Inertial Properties in Robotic Manipulation: An Object-Level Framework , 1995, Int. J. Robotics Res..

[8] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[9] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[10] Shigeo Hirose,et al. TITAN VII: quadruped walking and manipulating robot on a steep slope , 1997, Proceedings of International Conference on Robotics and Automation.

[11] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[12] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[13] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[14] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[15] Steven M. LaValle,et al. Randomized Kinodynamic Planning , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[16] Gerald Seet,et al. A new free gait generation for quadrupeds based on primary/secondary gait , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[17] Yasuhiro Fukuoka,et al. Adaptive dynamic walking of the quadruped on irregular terrain-autonomous adaptation using neural system model , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[18] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[19] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[20] Martin Buehler,et al. Stable Stair Climbing in a Simple Hexapod Robot , 2001 .

[21] Kunikatsu Takase,et al. Three-dimensional adaptive dynamic walking of a quadruped - rolling motion feedback to CPGs controlling pitching motion , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[22] Oliver Brock,et al. Task-consistent obstacle avoidance and motion behavior for mobile manipulation , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[23] Martin Buehler,et al. Reliable stair climbing in the simple hexapod 'RHex' , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[24] Jun Morimoto,et al. Minimax Differential Dynamic Programming: An Application to Robust Biped Walking , 2002, NIPS.

[25] S. Peng,et al. A biologically inspired four legged walking robot , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[26] J. Chestnutt,et al. Planning Biped Navigation Strategies in Complex Environments , 2003 .

[27] Yasuhiro Fukuoka,et al. Adaptive running of a quadruped robot on irregular terrain based on biological concepts , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[28] K. Masayoshi,et al. Adaptive gait for a quadruped robot on 3D path planning , 2003 .

[29] Kunikatsu Takase,et al. Adaptive dynamic walking of a quadruped robot 'Tekken' on irregular terrain using a neural system model , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[30] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.

[31] Joel E. Chestnutt,et al. A tiered planning strategy for biped navigation , 2004, 4th IEEE/RAS International Conference on Humanoid Robots, 2004..

[32] Martin Buehler,et al. Experimentally validated bounding models for the Scout II quadrupedal robot , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[33] Timothy Bretl,et al. Free-Climbing with a Multi-Use Robot , 2006, ISER.

[34] Peter Stone,et al. Machine Learning for Fast Quadrupedal Locomotion , 2004, AAAI.

[35] Hyoukryeol Choi,et al. Gait Planning of Quadruped Walking and Climbing Robot for Locomotion in 3D Environment , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[36] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[37] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.