论文信息 - Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives

Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives

The authors develop a theory characterizing optimal stopping times for discrete-time ergodic Markov processes with discounted rewards. The theory differs from prior work by its view of per-stage and terminal reward functions as elements of a certain Hilbert space. In addition to a streamlined analysis establishing existence and uniqueness of a solution to Bellman's equation, this approach provides an elegant framework for the study of approximate solutions. In particular, the authors propose a stochastic approximation algorithm that tunes weights of a linear combination of basis functions in order to approximate a value function. They prove that this algorithm converges (almost surely) and that the limit of convergence has some desirable properties. The utility of the approximation method is illustrated via a computational case study involving the pricing of a path dependent financial derivative security that gives rise to an optimal stopping problem with a 100-dimensional state space.

John N. Tsitsiklis | Benjamin Van Roy | J. Tsitsiklis

[1] F. Black,et al. The Pricing of Options and Corporate Liabilities , 1973, Journal of Political Economy.

[2] Alʹbert Nikolaevich Shiri︠a︡ev,et al. Optimal stopping rules , 1977 .

[3] David M. Kreps,et al. Martingales and arbitrage in multiperiod securities markets , 1979 .

[4] J. Harrison,et al. Martingales and stochastic integrals in the theory of continuous trading , 1981 .

[5] I. Karatzas. On the pricing of American options , 1988 .

[6] C. Watkins. Learning from delayed rewards , 1989 .

[7] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[8] John Rust. Using Randomization to Break the Curse of Dimensionality , 1997 .

[9] JÃ©rÃ´me Barraquand,et al. Numerical Valuation of High Dimensional Multivariate American Securities , 1995, Journal of Financial and Quantitative Analysis.

[10] D. Martineau,et al. Numerical Valuation of High Dimensional , 1995 .

[11] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[12] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[13] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[14] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[15] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[16] John N. Tsitsiklis,et al. Approximate Solutions to Optimal Stopping Problems , 1996, NIPS.

[17] P. Glasserman,et al. Pricing American-style securities using simulation , 1997 .

[18] Benjamin Van Roy. Learning and value function approximation in complex decision processes , 1998 .

[19] J. J. Yang,et al. Analysis of the SSAP Method for the Numerical Valuation of High-Dimensional Multivariate American Securities , 1999, Algorithmica.