论文信息 - Marginal Utility for Planning in Continuous or Large Discrete Action Spaces

Marginal Utility for Planning in Continuous or Large Discrete Action Spaces

Sample-based planning is a powerful family of algorithms for generating intelligent behavior from a model of the environment. Generating good candidate actions is critical to the success of sample-based planners, particularly in continuous or large action spaces. Typically, candidate action generation exhausts the action space, uses domain knowledge, or more recently, involves learning a stochastic policy to provide such search guidance. In this paper we explore explicitly learning a candidate action generator by optimizing a novel objective, marginal utility. The marginal utility of an action generator measures the increase in value of an action over previously generated actions. We validate our approach in both curling, a challenging stochastic domain with continuous state and action spaces, and a location game with a discrete but large action space. We show that a generator trained with the marginal utility objective outperforms hand-coded schemes built on substantial domain knowledge, trained stochastic policies, and other natural objectives for generating actions for sampled-based planners.

Michael H. Bowling | Michael Bowling | Zaheen Farraz Ahmad | Levi H. S. Lelis | Z. F. Ahmad

[1] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[2] Christopher D. Rosin,et al. Multi-armed bandits with episode context , 2011, Annals of Mathematics and Artificial Intelligence.

[3] Michael H. Bowling,et al. Action Selection for Hammer Shots in Curling , 2016, IJCAI.

[4] E. Nadaraya. On Estimating Regression , 1964 .

[5] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[6] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[7] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[8] Rémi Coulom,et al. Computing "Elo Ratings" of Move Patterns in the Game of Go , 2007, J. Int. Comput. Games Assoc..

[9] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[10] Santiago Ontañón,et al. Combinatorial Multi-armed Bandits for Real-Time Strategy Games , 2017, J. Artif. Intell. Res..

[11] G. S. Watson,et al. Smooth regression analysis , 1964 .

[12] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[13] H. Jaap van den Herik,et al. Progressive Strategies for Monte-Carlo Tree Search , 2008 .

[14] Nataliya Sokolovska,et al. Continuous Upper Confidence Trees , 2011, LION.

[15] Michael H. Bowling,et al. Monte Carlo Tree Search in Continuous Action Spaces with Execution Uncertainty , 2016, IJCAI.

[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17] Jaesik Choi,et al. Deep Reinforcement Learning in Continuous Action Spaces: a Case Study in the Game of Simulated Curling , 2018, ICML.

[18] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[19] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.