论文信息 - 11-041 Optimistic planning for sparsely stochastic systems ∗

11-041 Optimistic planning for sparsely stochastic systems ∗

We describe an online planning algorithm for finiteaction, sparsely stochastic Markov decision processes, in which the random state transitions can only end up in a small number of possible next states. The algorithm builds a planning tree by iteratively expanding states, where the most promising states are expanded first, in anoptimistic procedure aiming to return a good action after a strictly limited number of expansions. The novel algorithm is calledoptimistic planning for sparsely stochastic systems.

R. Munos | L. Buşoniu | B. Schutter | Robert Babuška

[1] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[2] Gabriel Kronberger,et al. Bandit-Based Monte-Carlo Planning for the Single-Machine Total Weighted Tardiness Scheduling Problem , 2007, EUROCAST.

[3] Rémi Munos,et al. Optimistic Planning of Deterministic Systems , 2008, EWRL.

[4] Rémi Munos,et al. Open Loop Optimistic Planning , 2010, COLT.

[5] Thomas J. Walsh,et al. Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.

[6] Bart De Schutter,et al. Optimistic planning for sparsely stochastic systems , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).