Speeding-up reinforcement learning through abstraction and transfer learning

We are interested in the following general question: is it possible to abstract knowledge that is generated while learning the solution of a problem, so that this abstraction can accelerate the learning process? Moreover, is it possible to transfer and reuse the acquired abstract knowledge to accelerate the learning process for future similar tasks? We propose a framework for conducting simultaneously two levels of reinforcement learning, where an abstract policy is learned while learning of a concrete policy for the problem, such that both policies are refined through exploration and interaction of the agent with the environment. We explore abstraction both to accelerate the learning process for an optimal concrete policy for the current problem, and to allow the application of the generated abstract policy in learning solutions for new problems. We report experiments in a robot navigation environment that show our framework to be effective in speeding up policy construction for practical problems and in generating abstractions that can be used to accelerate learning in new similar problems.

[1]  Alessandro Lazaric,et al.  Bayesian Multi-Task Reinforcement Learning , 2010, ICML.

[2]  Ron Meir,et al.  Integrating Partial Model Knowledge in Model Free RL Algorithms , 2011, ICML.

[3]  Martha White,et al.  Linear Off-Policy Actor-Critic , 2012, ICML.

[4]  Shie Mannor,et al.  Policy Gradients with Variance Related Risk Criteria , 2012, ICML.

[5]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[6]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7]  Jason Pazis,et al.  Generalized Value Functions for Large Action Sets , 2011, ICML.

[8]  M. van Otterlo Reinforcement Learning for Relational MDPs , 2004 .

[9]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[10]  Michèle Sebag,et al.  Feature Selection as a One-Player Game , 2010, ICML.

[11]  Yi Sun,et al.  Incremental Basis Construction from Temporal Difference Error , 2011, ICML.

[12]  Anna Helena Reali Costa,et al.  Finding Memoryless Probabilistic Relational Policies for Inter-task Reuse , 2012, IPMU.

[13]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[14]  Fabio Gagliardi Cozman,et al.  Simultaneous Abstract and Concrete Reinforcement Learning , 2011, SARA.

[15]  Ronald Parr,et al.  Greedy Algorithms for Sparse Reinforcement Learning , 2012, ICML.

[16]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[17]  Martijn van Otterlo,et al.  The Logic of Adaptive Behavior - Knowledge Representation and Algorithms for Adaptive Sequential Decision Making under Uncertainty in First-Order and Relational Domains , 2009, Frontiers in Artificial Intelligence and Applications.

[18]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[19]  Manuela M. Veloso,et al.  Probabilistic policy reuse in a reinforcement learning agent , 2006, AAMAS '06.

[20]  Javier García,et al.  Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..

[21]  Alborz Geramifard,et al.  Online Discovery of Feature Dependencies , 2011, ICML.