Analysis of Watson's Strategies for Playing Jeopardy!

Major advances in Question Answering technology were needed for IBM Watson1 to play Jeopardy! at championship level - the show requires rapid-fire answers to challenging natural language questions, broad general knowledge, high precision, and accurate confidence estimates. In addition, Jeopardy! features four types of decision making carrying great strategic importance: (1) Daily Double wagering; (2) Final Jeopardy wagering; (3) selecting the next square when in control of the board; (4) deciding whether to attempt to answer, i.e., "buzz in." Using sophisticated strategies for these decisions, that properly account for the game state and future event probabilities, can significantly boost a player's overall chances to win, when compared with simple "rule of thumb" strategies. This article presents our approach to developing Watson's game-playing strategies, comprising development of a faithful simulation model, and then using learning and Monte-Carlo methods within the simulator to optimize Watson's strategic decision-making. After giving a detailed description of each of our game-strategy algorithms, we then focus in particular on validating the accuracy of the simulator's predictions, and documenting performance improvements using our methods. Quantitative performance benefits are shown with respect to both simple heuristic strategies, and actual human contestant performance in historical episodes. We further extend our analysis of human play to derive a number of valuable and counterintuitive examples illustrating how human contestants may improve their performance on the show.

[1]  W. Hamilton,et al.  The Evolution of Cooperation , 1984 .

[2]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[3]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[4]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[5]  Gerald Tesauro,et al.  On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[6]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[7]  Kurt Hornik,et al.  On the generation of correlated artificial binary data , 1998 .

[8]  Dimitri P. Bertsekas,et al.  Rollout Algorithms for Stochastic Scheduling Problems , 1999, J. Heuristics.

[9]  Matthew L. Ginsberg,et al.  GIB: Steps Toward an Expert-Level Bridge-Playing Program , 1999, IJCAI.

[10]  Darse Billings,et al.  The First International RoShamBo Programming Competition , 2000, J. Int. Comput. Games Assoc..

[11]  Bill Ravens,et al.  An Introduction to Copulas , 2000, Technometrics.

[12]  Jonathan Schaeffer,et al.  The challenge of poker , 2002, Artif. Intell..

[13]  Brian Sheppard,et al.  World-championship-caliber Scrabble , 2002, Artif. Intell..

[14]  M. El-Sabaawi Breakdown of Will , 2002 .

[15]  Fu‐Chun Wu,et al.  Second-order Monte Carlo uncertainty/variability analysis using correlated model parameters: application to salmonid embryo survival risk assessment , 2004 .

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  R. Nelsen An Introduction to Copulas (Springer Series in Statistics) , 2006 .

[18]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[19]  M. Dufwenberg Game theory. , 2011, Wiley interdisciplinary reviews. Cognitive science.

[20]  Gerald Tesauro,et al.  Simulation, learning, and optimization techniques in Watson's game strategies , 2012, IBM J. Res. Dev..

[21]  David A. Ferrucci,et al.  Introduction to "This is Watson" , 2012, IBM J. Res. Dev..

[22]  Jennifer Chu-Carroll,et al.  Special Questions and techniques , 2012, IBM J. Res. Dev..

[23]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.