论文信息 - Regionalized Policy Representation for Reinforcement Learning in POMDPs

Regionalized Policy Representation for Reinforcement Learning in POMDPs

Many decision-making problems can be formulated in the framework of a partially observable Markov decision process (POMDP) [5]. The optimality of decisions relies on the accuracy of the POMDP model as well as the policy found for the model. In many applications the model is unknown and learned empirically based on experience, and building a model is just as difficult as finding the associated policy. Since the ultimate goal of decision making is the optimal policy, it is advantageous to learn an optimal policy directly from experience, without an intervening stage of model learning.

[1] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[2] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[3] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[4] Pieter Bram Bakker,et al. The state of mind : reinforcement learning with recurrent neural networks , 2004 .

[5] John Loch,et al. Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.

[6] Douglas Aberdeen,et al. Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.

[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[9] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[10] Marco Wiering,et al. Utile distinction hidden Markov models , 2004, ICML.