论文信息 - An analysis of optimistic, best-first search for minimax sequential decision making

An analysis of optimistic, best-first search for minimax sequential decision making

We consider problems in which a maximizer and a minimizer agent take actions in turn, such as games or optimal control with uncertainty modeled as an opponent. We extend the ideas of optimistic optimization to this setting, obtaining a search algorithm that has been previously considered as the best-first search variant of the B* method. We provide a novel analysis of the algorithm relying on a certain structure for the values of action sequences, under which earlier actions are more important than later ones. An asymptotic branching factor is defined as a measure of problem complexity, and it is used to characterize the relationship between computation invested and near-optimality. In particular, when action importance decreases exponentially, convergence rates are obtained. Throughout, examples illustrate analytical concepts such as the branching factor. In an empirical study, we compare the optimistic best-first algorithm with two classical game tree search methods, and apply it to a challenging HIV infection control problem.

[1] Donald E. Knuth,et al. An Analysis of Alpha-Beta Pruning , 1975, Artif. Intell..

[2] Hans J. Berliner,et al. The B* Tree Search Algorithm: A Best-First Proof Procedure , 1979, Artif. Intell..

[3] Judea Pearl,et al. The solution for the branching factor of the alpha-beta pruning algorithm and its optimality , 1982, CACM.

[4] Andrew J. Palay,et al. The B* Tree Search Algorithm - New Results , 1982, Artif. Intell..

[5] Richard E. Korf,et al. Best-First Minimax Search , 1996, Artif. Intell..

[6] Jonathan Schaeffer,et al. Best-First Fixed-Depth Minimax Algorithms , 1996, J. Int. Comput. Games Assoc..

[7] Jan M. Maciejowski,et al. Predictive control : with constraints , 2002 .

[8] Stefan Ratschan,et al. Search Heuristics for Box Decomposition Methods , 2002, J. Glob. Optim..

[9] B. Adams,et al. Dynamic multidrug therapies for hiv: optimal and sti control approaches. , 2004, Mathematical biosciences and engineering : MBE.

[10] Steven M. LaValle,et al. Planning algorithms , 2006 .

[11] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[12] Rémi Munos,et al. Optimistic Planning of Deterministic Systems , 2008, EWRL.

[13] Rémi Munos,et al. Open Loop Optimistic Planning , 2010, COLT.

[14] Thomas J. Walsh,et al. Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.

[15] Richard E. Korf,et al. Artificial Intelligence Search Algorithms , 1999, Algorithms and Theory of Computation Handbook.

[16] Michael L. Littman,et al. Sample-Based Planning for Continuous Action Markov Decision Processes , 2011, ICAPS.

[17] Rémi Munos,et al. Optimistic Optimization of Deterministic Functions , 2011, NIPS 2011.

[18] Bart De Schutter,et al. Optimistic planning for sparsely stochastic systems , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[19] Lucian Busoniu,et al. Optimistic planning for Markov decision processes , 2012, AISTATS.

[20] Robert Babuska,et al. Optimistic planning for continuous-action deterministic systems , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).