An analysis of optimistic, best-first search for minimax sequential decision making

We consider problems in which a maximizer and a minimizer agent take actions in turn, such as games or optimal control with uncertainty modeled as an opponent. We extend the ideas of optimistic optimization to this setting, obtaining a search algorithm that has been previously considered as the best-first search variant of the B* method. We provide a novel analysis of the algorithm relying on a certain structure for the values of action sequences, under which earlier actions are more important than later ones. An asymptotic branching factor is defined as a measure of problem complexity, and it is used to characterize the relationship between computation invested and near-optimality. In particular, when action importance decreases exponentially, convergence rates are obtained. Throughout, examples illustrate analytical concepts such as the branching factor. In an empirical study, we compare the optimistic best-first algorithm with two classical game tree search methods, and apply it to a challenging HIV infection control problem.

[1]  Donald E. Knuth,et al.  An Analysis of Alpha-Beta Pruning , 1975, Artif. Intell..

[2]  Hans J. Berliner,et al.  The B* Tree Search Algorithm: A Best-First Proof Procedure , 1979, Artif. Intell..

[3]  Judea Pearl,et al.  The solution for the branching factor of the alpha-beta pruning algorithm and its optimality , 1982, CACM.

[4]  Andrew J. Palay,et al.  The B* Tree Search Algorithm - New Results , 1982, Artif. Intell..

[5]  Richard E. Korf,et al.  Best-First Minimax Search , 1996, Artif. Intell..

[6]  Jonathan Schaeffer,et al.  Best-First Fixed-Depth Minimax Algorithms , 1996, J. Int. Comput. Games Assoc..

[7]  Jan M. Maciejowski,et al.  Predictive control : with constraints , 2002 .

[8]  Stefan Ratschan,et al.  Search Heuristics for Box Decomposition Methods , 2002, J. Glob. Optim..

[9]  B. Adams,et al.  Dynamic multidrug therapies for hiv: optimal and sti control approaches. , 2004, Mathematical biosciences and engineering : MBE.

[10]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[11]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[12]  Rémi Munos,et al.  Optimistic Planning of Deterministic Systems , 2008, EWRL.

[13]  Rémi Munos,et al.  Open Loop Optimistic Planning , 2010, COLT.

[14]  Thomas J. Walsh,et al.  Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.

[15]  Richard E. Korf,et al.  Artificial Intelligence Search Algorithms , 1999, Algorithms and Theory of Computation Handbook.

[16]  Michael L. Littman,et al.  Sample-Based Planning for Continuous Action Markov Decision Processes , 2011, ICAPS.

[17]  Rémi Munos,et al.  Optimistic Optimization of Deterministic Functions , 2011, NIPS 2011.

[18]  Bart De Schutter,et al.  Optimistic planning for sparsely stochastic systems , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[19]  Lucian Busoniu,et al.  Optimistic planning for Markov decision processes , 2012, AISTATS.

[20]  Robert Babuska,et al.  Optimistic planning for continuous-action deterministic systems , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).