论文信息 - Approximate Solutions to Optimal Stopping Problems

Approximate Solutions to Optimal Stopping Problems

We propose and analyze an algorithm that approximates solutions to the problem of optimal stopping in a discounted irreducible aperiodic Markov chain. The scheme involves the use of linear combinations of fixed basis functions to approximate a Q-function. The weights of the linear combination are incrementally updated through an iterative process similar to Q-learning, involving simulation of the underlying Markov chain. Due to space limitations, we only provide an overview of a proof of convergence (with probability 1) and bounds on the approximation error. This is the first theoretical result that establishes the soundness of a Q-learning-like algorithm when combined with arbitrary linear function approximators to solve a sequential decision problem. Though this paper focuses on the case of finite state spaces, the results extend naturally to continuous and unbounded state spaces, which are addressed in a forthcoming full-length paper.

John N. Tsitsiklis | Benjamin Van Roy | J. Tsitsiklis

[1] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[2] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[3] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[4] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..

[5] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[6] John N. Tsitsiklis,et al. Stable LInear Approximations to Dynamic Programming for Stochastic Control Problems with Local Transitions , 1995, NIPS.

[7] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.

[8] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[9] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.