Intrinsically motivated model learning for a developing curious agent

Reinforcement Learning (RL) agents are typically deployed to learn a specific, concrete task based on a pre-defined reward function. However, in some cases an agent may be able to gain experience in the domain prior to being given a task. In such cases, intrinsic motivation can be used to enable the agent to learn a useful model of the environment that is likely to help it learn its eventual tasks more efficiently. This paper presents the TEXPLORE with Variance-And-Novelty-Intrinsic-Rewards algorithm (TEXPLORE-VANIR), an intrinsically motivated model-based RL algorithm. The algorithm learns models of the transition dynamics of a domain using random forests. It calculates two different intrinsic motivations from this model: one to explore where the model is uncertain, and one to acquire novel experiences that the model has not yet been trained on. This paper presents experiments demonstrating that the combination of these two intrinsic rewards enables the algorithm to learn an accurate model of a domain with no external rewards and that the learned model can be used afterward to perform tasks in the domain. While learning the model, the agent explores the domain in a developing and curious way, progressively learning more complex skills. In addition, the experiments show that combining the agent's intrinsic rewards with external task rewards enables the agent to learn faster than using external rewards alone.

[1]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[2]  Olivier Sigaud,et al.  Learning the structure of Factored Markov Decision Processes in reinforcement learning problems , 2006, ICML.

[3]  Pierre-Yves Oudeyer,et al.  R-IAC: Robust Intrinsically Motivated Exploration and Active Learning , 2009, IEEE Transactions on Autonomous Mental Development.

[4]  Andrew G. Barto,et al.  Active Learning of Dynamic Bayesian Networks in Markov Decision Processes , 2007, SARA.

[5]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[6]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[7]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[8]  Peter Stone,et al.  Real time targeted exploration in large domains , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[11]  Andrew G. Barto,et al.  An intrinsic reward mechanism for efficient exploration , 2006, ICML.

[12]  Richard L. Lewis,et al.  Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[13]  Michael L. Littman,et al.  Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.

[14]  Peter Stone,et al.  Structure Learning in Ergodic Factored MDPs without Knowledge of the Transition Function's In-Degree , 2011, ICML.

[15]  Michael O. Duff,et al.  Design for an Optimal Probe , 2003, ICML.

[16]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[17]  Pierre-Yves Oudeyer,et al.  Robust intrinsically motivated exploration and active learning , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[18]  Peter Stone,et al.  RTMBA: A Real-Time Model-Based Reinforcement Learning Architecture for robot control , 2011, 2012 IEEE International Conference on Robotics and Automation.

[19]  Peter Stone,et al.  Model-based function approximation in reinforcement learning , 2007, AAMAS '07.

[20]  Pierre-Yves Oudeyer,et al.  Guest Editorial Active Learning and Intrinsically Motivated Exploration in Robots: Advances and Challenges , 2010, IEEE Trans. Auton. Ment. Dev..

[21]  Andrew G. Barto,et al.  Competence progress intrinsic motivation , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[22]  Dale Schuurmans,et al.  Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs , 2002, ICML.

[23]  Alan F. Murray,et al.  International Joint Conference on Neural Networks , 1993 .

[24]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[25]  Andrew G. Barto,et al.  Intrinsically Motivated Hierarchical Skill Learning in Structured Environments , 2010, IEEE Transactions on Autonomous Mental Development.

[26]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[27]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.