Vision-based reinforcement learning for humanoid behavior generation with rhythmic walking parameters

This paper presents a method for generating vision-based humanoid behaviors by reinforcement learning with rhythmic walking parameters. The walking is stabilized by a rhythmic motion controller such as CPG or neural oscillator. The learning process consists of two stages: the first one is building an action space with two parameters (a forward step length and a turning angle) that inhibits combinations that are not feasible. The second is reinforcement learning with the constructed action space and the state space consisting of visual features and posture parameters to find feasible actions. The method is applied to a situation of the RoboCupSoccer humanoid league [H. Kitano and M. Asada, Advanced Robotics, 2000], that is, to approach the ball and to shoot it into the goal. Instructions by human are given to start up the learning process and the rest is completely self-learning in real situations.

[1]  T. Takenaka,et al.  The development of Honda humanoid robot , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[2]  J. Thomson,et al.  The role of visual information in control of a constrained locomotor task. , 1988, Journal of motor behavior.

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  Hiroaki Kitano,et al.  The RoboCup humanoid challenge as the millennium challenge for advanced robotics , 1998, Adv. Robotics.

[5]  Shinya Aoi,et al.  An Emergent Control of Gait Patterns of Legged Locomotion Robots , 2001 .

[6]  Atsuo Takanishi,et al.  Development of a dynamic biped walking system for humanoid - development of a biped walking robot adapting to the humans' living floor , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[7]  B. Achiriloaie,et al.  VI REFERENCES , 1961 .

[8]  Hiroshi Kimura,et al.  Biologically-inspired adaptive dynamic walking of the quadruped on irregular terrain , 1999, Proceedings 1999 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human and Environment Friendly Robots with High Intelligence and Emotional Quotients (Cat. No.99CH36289).

[9]  Hiroshi Shimizu,et al.  Self-organized control of bipedal locomotion by neural oscillators in unpredictable environment , 1991, Biological Cybernetics.

[10]  Yasuo Kuniyoshi,et al.  Three dimensional bipedal stepping motion using neural oscillators-towards humanoid motion in the real world , 1998, Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No.98CH36190).

[11]  Gentaro Taga,et al.  A model of the neuro-musculo-skeletal system for anticipatory adjustment of human locomotion during obstacle avoidance , 1998, Biological Cybernetics.

[12]  Jerry E. Pratt,et al.  Exploiting inherent robustness and natural dynamics in the control of bipedal walking robots , 2000 .

[13]  Shuuji Kajita,et al.  Adaptive Gait Control of a Biped Robot Based on Realtime Sensing of the Ground Profile , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[14]  S. Grillner Neurobiological bases of rhythmic motor acts in vertebrates. , 1985, Science.