论文信息 - A developmental actor-critic reinforcement learning approach for task-nonspecific robot

A developmental actor-critic reinforcement learning approach for task-nonspecific robot

For task-nonspecific robot work in changing environment, most researches on developmental learning based on cognitive psychology advocate a staged developmental process and an explicit hierarchical action model. Though progress has been made by existing developmental learning approaches, there are still two open-ended inevitable problems: a) when numerous tasks are involved, the learning speed is not always satisfactory; b) when these tasks are not specified in advance, the hierarchical action model is hard to design beforehand or learn automatically. In order to solve these two problems, this paper proposes a new developmental reinforcement learning approach presented with its model and algorithms. In our model, any one of actor-critic learning models is encapsulated as a learning infrastructure to build an implicit action model called reward-policy mapping, and a self-motivated module is used for autonomous robots. The proposed approach efficaciously supports the implementation of an autonomous, interactive, cumulative and online learning process of task-nonspecific robots. The simulation results show that, to learn to perform nearly twenty thousand tasks, the proposed approach just needs half of the time that its counterpart, the actor-critic learning algorithm encapsulated, needs.

[1] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[2] Nadine Le Fort-Piat,et al. Reward Function and Initial Values: Better Choices for Accelerated Goal-Directed Reinforcement Learning , 2006, ICANN.

[3] Hao Wang,et al. Reinforcement learning transfer based on subgoal discovery and subtask similarity , 2014, IEEE/CAA Journal of Automatica Sinica.

[4] Juraj Spalek,et al. Novelty detector for reinforcement learning based on forecasting , 2014, 2014 IEEE 12th International Symposium on Applied Machine Intelligence and Informatics (SAMI).

[5] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[6] Marco Mirolli,et al. Evolution and Learning in an Intrinsically Motivated Reinforcement Learning Robot , 2007, ECAL.

[7] Juyang Weng,et al. Incremental Hierarchical Discriminant Regression , 2007, IEEE Transactions on Neural Networks.

[8] Robert Babuska,et al. A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[9] Thomas G. Dietterich. An Overview of MAXQ Hierarchical Reinforcement Learning , 2000, SARA.

[10] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[11] Giulio Sandini,et al. Developmental robotics: a survey , 2003, Connect. Sci..

[12] Hang Dong,et al. Malware detection method of android application based on simplification instructions , 2014 .

[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14] Jordan B. Pollack,et al. Evolutionary Techniques in Physical Robotics , 2000, ICES.

[15] K. H. Stauder,et al. Psychology of the Child , 1959 .

[16] Hiroshi Kawano,et al. Hierarchical sub-task decomposition for reinforcement learning of multi-robot delivery mission , 2013, 2013 IEEE International Conference on Robotics and Automation.

[17] Yantao Tian,et al. Obstacle avoidance of multi mobile robots based on behavior decomposition reinforcement learning , 2007, 2007 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[18] Nuttapong Chentanez,et al. Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .