论文信息 - Learning and planning in structured worlds

Learning and planning in structured worlds

This thesis is concerned with the problem of how to make decisions in an uncertain world. We use a model of uncertainty based on Markov decision problems, and develop a number of algorithms for decision-making both for the planning problem, in which the model is known in advance, and for the reinforcement learning problem in which the decision-making agent does not know the model and must learn to make good decisions by trial and error. The basis for much of this work is the use of structural representations of problems. If a problem is represented in a structured way we can compute or learn plans that take advantage of this structure for computational gains. This is because the structure allows us to perform abstraction. Rather than reasoning about each situation in which a decision must be made individually, abstraction allows us to group situations together and reason about a whole set of them in a single step. Our approach to abstraction has the additional advantage that we can dynamically change the level of abstraction, splitting a group of situations in two if they need to be reasoned about separately to find an acceptable plan, or merging two groups together if they no longer need to be distinguished. We present two planning algorithms and one learning algorithm that use this approach. A second idea we present in this thesis is a novel approach to the exploration problem in reinforcement learning. The problem is to select actions to perform given that we would like good performance now and in the future. We can select the current best action to perform, but this may prevent us from discovering that another action is better, or we can take an exploratory action, but we risk performing poorly now as a result. Our Bayesian approach makes this tradeoff explicit by representing our uncertainty about the values of states and using this measure of uncertainty to estimate the value of the information we could gain by performing each action. We present both model-free and model-based reinforcement learning algorithms that make use of this exploration technique. Finally, we show how these ideas fit together to produce a reinforcement learning algorithm that uses structure to represent both the problem being solved and the plan it learns, and that selects actions to perform in order to learn using our Bayesian approach to exploration.

Craig Boutilier | Richard Dearden | Craig Boutilier | R. Dearden

[1] Mark A. Peot,et al. Postponing Threats in Partial-Order Planning , 1993, AAAI.

[2] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .

[3] Ronald A. Howard,et al. Dynamic Probabilistic Systems , 1971 .

[4] John L. Pollock,et al. The Logical Foundations of Goal-Regression Planning in Autonomous Agents , 1998, Artif. Intell..

[5] Wai Lam,et al. Using Causal Information and Local Measures to Learn Bayesian Networks , 1993, UAI.

[6] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[7] R. I. Bahar,et al. Algebraic decision diagrams and their applications , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).

[8] Leslie Pack Kaelbling,et al. Toward Approximate Planning in Very Large Stochastic Domains , 1994, AAAI 1994.

[9] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.

[10] Ronen I. Brafman,et al. Prioritized Goal Decomposition of Markov Decision Processes: Toward a Synthesis of Classical and Decision Theoretic Planning , 1997, IJCAI.

[11] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .

[12] Richard Fikes,et al. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[13] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.

[14] Mark D. Johnston,et al. Scheduling with neural networks - the case of the hubble space telescope , 1992, Comput. Oper. Res..

[15] Gregory F. Cooper,et al. A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[16] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[17] Ronald L. Rivest,et al. Learning decision lists , 2004, Machine Learning.

[18] Ross D. Shachter. Evaluating Influence Diagrams , 1986, Oper. Res..

[19] Craig Boutilier,et al. Context-Specific Independence in Bayesian Networks , 1996, UAI.