A developmental actor-critic reinforcement learning approach for task-nonspecific robot

For task-nonspecific robot work in changing environment, most researches on developmental learning based on cognitive psychology advocate a staged developmental process and an explicit hierarchical action model. Though progress has been made by existing developmental learning approaches, there are still two open-ended inevitable problems: a) when numerous tasks are involved, the learning speed is not always satisfactory; b) when these tasks are not specified in advance, the hierarchical action model is hard to design beforehand or learn automatically. In order to solve these two problems, this paper proposes a new developmental reinforcement learning approach presented with its model and algorithms. In our model, any one of actor-critic learning models is encapsulated as a learning infrastructure to build an implicit action model called reward-policy mapping, and a self-motivated module is used for autonomous robots. The proposed approach efficaciously supports the implementation of an autonomous, interactive, cumulative and online learning process of task-nonspecific robots. The simulation results show that, to learn to perform nearly twenty thousand tasks, the proposed approach just needs half of the time that its counterpart, the actor-critic learning algorithm encapsulated, needs.

[1]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[2]  Nadine Le Fort-Piat,et al.  Reward Function and Initial Values: Better Choices for Accelerated Goal-Directed Reinforcement Learning , 2006, ICANN.

[3]  Hao Wang,et al.  Reinforcement learning transfer based on subgoal discovery and subtask similarity , 2014, IEEE/CAA Journal of Automatica Sinica.

[4]  Juraj Spalek,et al.  Novelty detector for reinforcement learning based on forecasting , 2014, 2014 IEEE 12th International Symposium on Applied Machine Intelligence and Informatics (SAMI).

[5]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[6]  Marco Mirolli,et al.  Evolution and Learning in an Intrinsically Motivated Reinforcement Learning Robot , 2007, ECAL.

[7]  Juyang Weng,et al.  Incremental Hierarchical Discriminant Regression , 2007, IEEE Transactions on Neural Networks.

[8]  Robert Babuska,et al.  A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[9]  Thomas G. Dietterich An Overview of MAXQ Hierarchical Reinforcement Learning , 2000, SARA.

[10]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[11]  Giulio Sandini,et al.  Developmental robotics: a survey , 2003, Connect. Sci..

[12]  Hang Dong,et al.  Malware detection method of android application based on simplification instructions , 2014 .

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Jordan B. Pollack,et al.  Evolutionary Techniques in Physical Robotics , 2000, ICES.

[15]  K. H. Stauder,et al.  Psychology of the Child , 1959 .

[16]  Hiroshi Kawano,et al.  Hierarchical sub-task decomposition for reinforcement learning of multi-robot delivery mission , 2013, 2013 IEEE International Conference on Robotics and Automation.

[17]  Yantao Tian,et al.  Obstacle avoidance of multi mobile robots based on behavior decomposition reinforcement learning , 2007, 2007 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[18]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .