Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives

The authors develop a theory characterizing optimal stopping times for discrete-time ergodic Markov processes with discounted rewards. The theory differs from prior work by its view of per-stage and terminal reward functions as elements of a certain Hilbert space. In addition to a streamlined analysis establishing existence and uniqueness of a solution to Bellman's equation, this approach provides an elegant framework for the study of approximate solutions. In particular, the authors propose a stochastic approximation algorithm that tunes weights of a linear combination of basis functions in order to approximate a value function. They prove that this algorithm converges (almost surely) and that the limit of convergence has some desirable properties. The utility of the approximation method is illustrated via a computational case study involving the pricing of a path dependent financial derivative security that gives rise to an optimal stopping problem with a 100-dimensional state space.

[1]  F. Black,et al.  The Pricing of Options and Corporate Liabilities , 1973, Journal of Political Economy.

[2]  Alʹbert Nikolaevich Shiri︠a︡ev,et al.  Optimal stopping rules , 1977 .

[3]  David M. Kreps,et al.  Martingales and arbitrage in multiperiod securities markets , 1979 .

[4]  J. Harrison,et al.  Martingales and stochastic integrals in the theory of continuous trading , 1981 .

[5]  I. Karatzas On the pricing of American options , 1988 .

[6]  C. Watkins Learning from delayed rewards , 1989 .

[7]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[8]  John Rust Using Randomization to Break the Curse of Dimensionality , 1997 .

[9]  Jérôme Barraquand,et al.  Numerical Valuation of High Dimensional Multivariate American Securities , 1995, Journal of Financial and Quantitative Analysis.

[10]  D. Martineau,et al.  Numerical Valuation of High Dimensional , 1995 .

[11]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[12]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[13]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[14]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[15]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[16]  John N. Tsitsiklis,et al.  Approximate Solutions to Optimal Stopping Problems , 1996, NIPS.

[17]  P. Glasserman,et al.  Pricing American-style securities using simulation , 1997 .

[18]  Benjamin Van Roy Learning and value function approximation in complex decision processes , 1998 .

[19]  J. J. Yang,et al.  Analysis of the SSAP Method for the Numerical Valuation of High-Dimensional Multivariate American Securities , 1999, Algorithmica.