A new way to search game trees: technical perspective
暂无分享,去创建一个
in 1994, The program Chinook won the title “Man-Machine World Champion’’ in checkers from Marion Tinsley. Shortly thereafter, in 1997, the program Deep Blue bested the human chess champion Garry Kasparov. This past year, IBM researchers set the stakes even higher: on network TV, their program Watson defeated two human champions in the game show “Jeopardy!.” Researchers in artificial intelligence (AI) are buoyed by these achievements, but they also know that computers still lag far behind human levels of play in many other games of strategy. One such contest of wits is Go, an ancient two-player board game from China. With simple rules but dizzyingly complex strategies, the game of Go remains a stronghold of human superiority, thwarting the kinds of approaches that have proven so successful in checkers and chess. That may be about to change. After decades of halting progress in computer Go, variations of Monte Carlo tree search (MCTS) have shown themselves to be a real, well, game changer. MCTS is an approach to decision making that uses random sampling to quickly distinguish strong and weak moves. The following paper documents the dramatic breakthroughs achieved by applying MCTS to Go. Why do these results matter for the broader field of AI? The history of AI has shown that algorithms and ideas pioneered in the context of game playing have a wide range of important applications. Insights from computer game playing have often constituted the “1% inspiration” in Edison’s formula for creative success. Combined with the “99% perspiration” of careful engineering, they underpin a vast number of systems in common use today. In addition to the potential applications of MCTS, the more immediate achievement is impressive in its own right. The authors describe a line of work that, over five years, has halved the number of rating levels between today’s computers and the world’s top human players. Why has it been so difficult for computers to play world-class Go? In a sense, writing a program for game playing is easy: When it’s your turn, make the best move. Of course, picking that move is where the real challenge comes in. In the field of reinforcement learning that gave rise to the key ideas behind MCTS, a rule for choosing a move as a function of the game state is known as a policy. A rule for evaluating a game state to predict, say, the likelihood of an expert player winning from that game state is called a value function. While a good policy—one that plays at a world-class level—is the ultimate goal, value functions have proven quite useful as well. A sufficiently accurate value function can be used to create a policy. Just contemplate each legal move in turn, evaluate the game state that would be the outcome of that move, and then greedily pick the move with the best outcome. A weak value function can often be improved upon by considering sequences, say all moves out to a length of 15 turns, instead of one move at a time. This approach, called game-tree search, has proven wildly successful in many games, notably chess and checkers. Still, Go has resisted this line of attack. The huge branching factor and the importance of positional play make it difficult to define a good value function. Fortunately, the relationship between policy and value function can be applied the other way. Given a policy, we can find its value function by playing the game through to the end using the policy at each step to select a move, then returning the value of the final game state. Moreover, as in game-tree search, we can also amplify the strength of a “good enough” policy. If the initial policy has a degree of randomness in it, the algorithm can keep track of which random decisions tend to lead to better outcomes and gradually adjust the policy to make those decisions more often. The resulting scheme explores the space of possibilities in a novel way. Where game-tree search is wide, but shallow, this policy-oriented search is narrow, but deep. Trajectories visit end-game states, but not all possible paths are followed—just the most promising ones. This process, this game-tree-search-turned-on-its-head known as MCTS, has revolutionized the quality of computer play for Go. Given the recent progress in computer Go, many researchers are interested in applying variants of MCTS to other problems beyond the world of games. The next few years are likely to see an explosion of work providing insight into what types of problems are best suited to these techniques. One might say that Go has given the green light to an exciting new class of algorithms.