论文信息 - Play selection in American football: a case study in neuro-dynamic programming

Play selection in American football: a case study in neuro-dynamic programming

We present a computational case study of neuro-dynamic programming, a recent class of reinforcement learning methods. We cast the problem of play selection in American football as a stochastic shortest path Markov Decision Problem (MDP). In particular, we consider the problem faced by a quarterback in attempting to maximize the net score of an offensive drive. The resulting optimization problem serves as a medium-scale testbed for numerical algorithms based on policy iteration.

Dimitri P. Bertsekas | Stephen D. Patek | D. Bertsekas | S. Patek

[1] Paul J. Werbos,et al. Approximate dynamic programming for real-time control and neural modeling , 1992 .

[2] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[3] John N. Tsitsiklis,et al. An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[4] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[5] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[6] Dimitri P. Bertsekas,et al. A Counterexample to Temporal Differences Learning , 1995, Neural Computation.

[7] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[8] S. Hyakin,et al. Neural Networks: A Comprehensive Foundation , 1994 .

[9] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10] Franklin A. Graybill,et al. Introduction to The theory , 1974 .

[11] Ward Whitt,et al. Approximations of Dynamic Programs, II , 1979, Math. Oper. Res..

[12] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .

[13] Anders Krogh,et al. Introduction to the theory of neural computation , 1994, The advanced book program.

[14] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[15] Ward Whitt,et al. Approximations of Dynamic Programs, I , 1978, Math. Oper. Res..