论文信息 - Model-Based Active Learning in Hierarchical Policies

Model-Based Active Learning in Hierarchical Policies

Hierarchical task decompositions play an essential role in the design of complex simulation and decision systems, such as the ones that arise in video games. Game designers find it very natural to adopt a divide-and-conquer philosophy of specifying hierarchical policies, where decision modules can be constructed somewhat independently. The process of choosing the parameters of these modules manually is typically lengthy and tedious. The hierarchical reinforcement learning (HRL) field has produced elegant ways of decomposing policies and value functions using semi-Markov decision processes. However, there is still a lack of demonstrations in larger nonlinear systems with discrete and continuous variables. To narrow this gap between industrial practices and academic ideas, we address the problem of designing efficient algorithms to facilitate the deployment of HRL ideas in more realistic settings. In particular, we propose Bayesian active learning methods to learn the relevant aspects of either policies or value functions by focusing on the most relevant parts of the parameter and state spaces respectively. To demonstrate the scalability of our solution, we have applied it to The Open Racing Car Simulator (TORCS), a 3D game engine that implements complex vehicle dynamics. The environment is a large topological map roughly based on downtown Vancouver, British Columbia. Higher

Vlad M. Cora

[1] Christos Dimitrakakis,et al. TORCS, The Open Racing Car Simulator , 2005 .

[2] Harold J. Kushner,et al. A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise , 1963 .

[3] Lihong Li,et al. Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.

[4] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[5] Prasad Tadepalli,et al. Model-based Hierarchical Average-reward Reinforcement Learning , 2002, International Conference on Machine Learning.

[6] Nando de Freitas,et al. Active Policy Learning for Robot Planning and Exploration under Uncertainty , 2007, Robotics: Science and Systems.

[7] C. D. Perttunen,et al. Lipschitzian optimization without the Lipschitz constant , 1993 .

[8] Simon Streltsov,et al. A Non-myopic Utility Function for Statistical Global Optimization Algorithms , 1999, J. Glob. Optim..

[9] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[10] Bruno Betrò,et al. Bayesian methods in global optimization , 1991, J. Glob. Optim..

[11] Piotr J. Gmytrasiewicz,et al. Interactive dynamic influence diagrams , 2007, AAMAS '07.