Reinforcement learning under circumstances beyond its control

Decision theory addresses the task of choosing an action; it provides robust decision-making criteria that support decision-making under conditions of uncertainty or risk. Decision theory has been applied to produce reinforcement learning algorithms that manage uncertainty in state-transitions. However, performance when there is uncertainty regarding the selection of future actions must also be considered, since reinforcement learning tasks are multiple-step decision problems. This work proposes beta-pessimistic Q-learning—a reinforcement learning algorithm that does not assume complete control.

[1]  Peter Geibel,et al.  Reinforcement Learning with Bounded Risk , 2001, ICML.

[2]  S. French Decision Theory: An Introduction to the Mathematics of Rationality , 1986 .

[3]  D. Bertsekas Control of uncertain systems with a set-membership description of the uncertainty , 1971 .

[4]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[5]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[6]  Matthias Heger The Loss from Imperfect Value Functions in Expectation-Based and Minimax-Based Tasks , 1996, Machine Learning.

[7]  Steven I. Marcus,et al.  Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes , 1999, Autom..

[8]  Alexander Zelinsky,et al.  Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.

[9]  Jun Morimoto,et al.  Robust Reinforcement Learning , 2005, Neural Computation.

[10]  Csaba Szepesvári,et al.  A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[11]  George H. John When the Best Move Isn't Optimal: Q-learning with Exploration , 1994, AAAI.

[12]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[13]  Reid G. Simmons,et al.  The Effect of Representation and Knowledge on Goal-Directed Exploration with Reinforcement-Learning Algorithms , 2005, Machine Learning.

[14]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Matthias Heger,et al.  Consideration of Risk in Reinforcement Learning , 1994, ICML.

[17]  Gavin Adrian Rummery Problem solving with reinforcement learning , 1995 .