Improved LinUCT and its evaluation on incremental random-feature tree

UCT is a standard method of Monte Carlo tree search (MCTS) algorithms, which have been applied to various domains and have achieved remarkable success. This study proposes a family of Leaf-LinUCT, which are improved LinUCT algorithms incorporating LinUCB into MCTS. LinUCB outperforms UCB1 in contextual multi-armed bandit problems, owing to a kind of online learning with ridge regression. However, due to the minimax structure of game trees, ridge regression in LinUCB does not always work well in the context of tree search. In this paper, we remedy the problem and extend our previous work on LinUCT in two ways: (1) by restricting teacher data for regression to the frontier nodes in a current search tree, and (2) by adjusting the feature vector of each internal node to the weighted mean of the feature vector of the descendant nodes. We also present a new synthetic model, incremental-random-feature tree, by extending the standard incremental random tree model. In our model, each node has a feature vector that represents the characteristics of the corresponding position. The elements of a feature vector in a node are randomly changed from those in its parent node by each move, as the heuristic score of a node is randomly changed by each move in the standard incremental random tree model. The experimental results show that our Leaf-LinUCT outperformed UCT and existing LinUCT algorithms, in the incremental-random-feature treeand a synthetic game studied in [1].

[1]  David Silver,et al.  Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..

[2]  Michael Buro,et al.  Improving heuristic mini-max search by supervised learning , 2002, Artif. Intell..

[3]  Rémi Munos,et al.  Online gradient descent for least squares regression: Non-asymptotic bounds and application to bandits , 2013, ArXiv.

[4]  H. Jaap van den Herik,et al.  Progressive Strategies for Monte-Carlo Tree Search , 2008 .

[5]  Michèle Sebag,et al.  The grand challenge of computer Go , 2012, Commun. ACM.

[6]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[7]  Akihiro Kishimoto,et al.  Scalable Distributed Monte-Carlo Tree Search , 2011, SOCS.

[8]  Dana S. Nau,et al.  An Analysis of Forward Pruning , 1994, AAAI.

[9]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[10]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[11]  Tomoyuki Kaneko,et al.  LinUCB Applied to Monte-Carlo Tree Search , 2015, ACG.

[12]  Tomoyuki Kaneko,et al.  Large-Scale Optimization for Evaluation Functions with Minimax Search , 2014, J. Artif. Intell. Res..

[13]  Y. Björnsson,et al.  Game-Tree Properties and MCTS Performance , 2011 .

[14]  Thomas J. Walsh,et al.  Exploring compact reinforcement-learning representations with linear regression , 2009, UAI.

[15]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[16]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[17]  Christopher D. Rosin,et al.  Multi-armed bandits with episode context , 2011, Annals of Mathematics and Artificial Intelligence.

[18]  Damien Ernst,et al.  Comparison of different selection strategies in Monte-Carlo Tree Search for the game of Tron , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[19]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[20]  Michael Buro,et al.  Minimum Proof Graphs and Fastest-Cut-First Search Heuristics , 2009, IJCAI.

[21]  Yuandong Tian,et al.  Better Computer Go Player with Neural Network and Long-term Prediction , 2016, ICLR.

[22]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[23]  Donald E. Knuth,et al.  The Solution for the Branching Factor of the Alpha-Beta Pruning Algorithm , 1981, ICALP.

[24]  Tomoyuki Kaneko,et al.  Enhancements in Monte Carlo tree search algorithms for biased game trees , 2015, 2015 IEEE Conference on Computational Intelligence and Games (CIG).

[25]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[26]  Richard E. Korf,et al.  Best-First Minimax Search , 1996, Artif. Intell..