Quadruped robot obstacle negotiation via reinforcement learning

Legged robots can, in principle, traverse a large variety of obstacles and terrains. In this paper, we describe a successful application of reinforcement learning to the problem of negotiating obstacles with a quadruped robot. Our algorithm is based on a two-level hierarchical decomposition of the task, in which the high-level controller selects the sequence of foot-placement positions, and the low-level controller generates the continuous motions to move each foot to the specified positions. The high-level controller uses an estimate of the value function to guide its search; this estimate is learned partially from supervised data. The low-level controller is obtained via policy search. We demonstrate that our robot can successfully climb over a variety of obstacles which were not seen at training time

[1]  Marc H. Raibert,et al.  Legged Robots That Balance , 1986, IEEE Expert.

[2]  Francis L. Merat,et al.  Introduction to robotics: Mechanics and control , 1987, IEEE J. Robotics Autom..

[3]  John J. Craig,et al.  Introduction to robotics - mechanics and control (2. ed.) , 1989 .

[4]  Oussama Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.

[5]  Giuseppe Oriolo,et al.  Robot Obstacle Avoidance Using Vortex Fields , 1991 .

[6]  S. Hirose,et al.  Machine that can walk and climb on floors, walls and ceilings , 1991, Fifth International Conference on Advanced Robotics 'Robots in Unstructured Environments.

[7]  Oussama Khatib,et al.  Inertial Properties in Robotic Manipulation: An Object-Level Framework , 1995, Int. J. Robotics Res..

[8]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[9]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[10]  Shigeo Hirose,et al.  TITAN VII: quadruped walking and manipulating robot on a steep slope , 1997, Proceedings of International Conference on Robotics and Automation.

[11]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[12]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  Doina Precup,et al.  Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[15]  Steven M. LaValle,et al.  Randomized Kinodynamic Planning , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[16]  Gerald Seet,et al.  A new free gait generation for quadrupeds based on primary/secondary gait , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[17]  Yasuhiro Fukuoka,et al.  Adaptive dynamic walking of the quadruped on irregular terrain-autonomous adaptation using neural system model , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[18]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[19]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[20]  Martin Buehler,et al.  Stable Stair Climbing in a Simple Hexapod Robot , 2001 .

[21]  Kunikatsu Takase,et al.  Three-dimensional adaptive dynamic walking of a quadruped - rolling motion feedback to CPGs controlling pitching motion , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[22]  Oliver Brock,et al.  Task-consistent obstacle avoidance and motion behavior for mobile manipulation , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[23]  Martin Buehler,et al.  Reliable stair climbing in the simple hexapod 'RHex' , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[24]  Jun Morimoto,et al.  Minimax Differential Dynamic Programming: An Application to Robust Biped Walking , 2002, NIPS.

[25]  S. Peng,et al.  A biologically inspired four legged walking robot , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[26]  J. Chestnutt,et al.  Planning Biped Navigation Strategies in Complex Environments , 2003 .

[27]  Yasuhiro Fukuoka,et al.  Adaptive running of a quadruped robot on irregular terrain based on biological concepts , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[28]  K. Masayoshi,et al.  Adaptive gait for a quadruped robot on 3D path planning , 2003 .

[29]  Kunikatsu Takase,et al.  Adaptive dynamic walking of a quadruped robot 'Tekken' on irregular terrain using a neural system model , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[30]  Jeff G. Schneider,et al.  Policy Search by Dynamic Programming , 2003, NIPS.

[31]  Joel E. Chestnutt,et al.  A tiered planning strategy for biped navigation , 2004, 4th IEEE/RAS International Conference on Humanoid Robots, 2004..

[32]  Martin Buehler,et al.  Experimentally validated bounding models for the Scout II quadrupedal robot , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[33]  Timothy Bretl,et al.  Free-Climbing with a Multi-Use Robot , 2006, ISER.

[34]  Peter Stone,et al.  Machine Learning for Fast Quadrupedal Locomotion , 2004, AAAI.

[35]  Hyoukryeol Choi,et al.  Gait Planning of Quadruped Walking and Climbing Robot for Locomotion in 3D Environment , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[36]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[37]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.