Learning Exercise Policies for American Options

Options are important instruments in modern nance. In this paper, we investigate reinforcement learning (RL) methods| in particular, least-squares policy iteration (LSPI)|for the problem of learning exercise policies for American options. We develop nite-time bounds on the performance of the policy obtained with LSPI and compare LSPI and the tted Q-iteration algorithm (FQI) with the Longsta-Sc hwartz method (LSM), the standard least-squares Monte Carlo algorithm from the nance community. Our empirical results show that the exercise policies discovered by LSPI and FQI gain larger payos than those discovered by LSM, on both real and synthetic data. Furthermore, we nd that for all methods the policies learned from real data generally gain similar payos to the policies learned from simulated data. Our work shows that solution methods developed in machine learning can advance the state-of-the-art in an important and challenging application area, while demonstrating that computational nance remains a promising area for future applications of machine learning methods.

[1]  J. Hull Options, Futures, and Other Derivatives , 1989 .

[2]  D. Duffie Dynamic Asset Pricing Theory , 1992 .

[3]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[4]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[5]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[6]  Andrew G. Barto,et al.  Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.

[7]  John N. Tsitsiklis,et al.  Regression methods for pricing complex American-style options , 2001, IEEE Trans. Neural Networks.

[8]  Francis A. Longstaff,et al.  Valuing American Options by Simulation: A Simple Least-Squares Approach , 2001 .

[9]  Paul Glasserman,et al.  Monte Carlo Methods in Financial Engineering , 2003 .

[10]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[11]  M. Broadie,et al.  Option Pricing: Valuation Models and Applications , 2004 .

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Csaba Szepesvári,et al.  Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.

[14]  Csaba Szepesvári,et al.  Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.

[15]  Csaba Szepesvári,et al.  Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..