论文信息 - Non-Markovian Policies in Sequential Decision Problems

Non-Markovian Policies in Sequential Decision Problems

In this article we prove the validity of the Dellman Optimality Equa tion a.nd related results for sequential decision problems with a general recursive structure. The characteristic feature of our approach is that also non-Markovian policies are taken into account. The theory is moti vated by some experiments with a learning robot.

Csaba Szepesvári

[1] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[2] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[3] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .

[4] S. Verdú,et al. Abstract dynamic programming models under commutativity conditions , 1987 .

[5] Onésimo Hernández-Lerma,et al. Controlled Markov Processes , 1965 .

[6] D. Bertsekas. Monotone Mappings with Application in Dynamic Programming , 1977 .

[7] Csaba Szepesvari,et al. Module Based Reinforcement Learning for a Real Robot , 1997 .

[8] Csaba Szepesvári,et al. Learning and Exploitation Do Not Conflict Under Minimax Optimality , 1997, ECML.