论文信息 - Reinforcement learning under circumstances beyond its control

Reinforcement learning under circumstances beyond its control

Decision theory addresses the task of choosing an action; it provides robust decision-making criteria that support decision-making under conditions of uncertainty or risk. Decision theory has been applied to produce reinforcement learning algorithms that manage uncertainty in state-transitions. However, performance when there is uncertainty regarding the selection of future actions must also be considered, since reinforcement learning tasks are multiple-step decision problems. This work proposes beta-pessimistic Q-learning—a reinforcement learning algorithm that does not assume complete control.

Chris Gaskett | Chris Gaskett

[1] Peter Geibel,et al. Reinforcement Learning with Bounded Risk , 2001, ICML.

[2] S. French. Decision Theory: An Introduction to the Mathematics of Rationality , 1986 .

[3] D. Bertsekas. Control of uncertain systems with a set-membership description of the uncertainty , 1971 .

[4] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[5] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[6] Matthias Heger. The Loss from Imperfect Value Functions in Expectation-Based and Minimax-Based Tasks , 1996, Machine Learning.

[7] Steven I. Marcus,et al. Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes , 1999, Autom..

[8] Alexander Zelinsky,et al. Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.

[9] Jun Morimoto,et al. Robust Reinforcement Learning , 2005, Neural Computation.

[10] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[11] George H. John. When the Best Move Isn't Optimal: Q-learning with Exploration , 1994, AAAI.

[12] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.

[13] Reid G. Simmons,et al. The Effect of Representation and Knowledge on Goal-Directed Exploration with Reinforcement-Learning Algorithms , 2005, Machine Learning.

[14] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .

[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16] Matthias Heger,et al. Consideration of Risk in Reinforcement Learning , 1994, ICML.

[17] Gavin Adrian Rummery. Problem solving with reinforcement learning , 1995 .