Optimal Stopping Under Partial Observation: Near-Value Iteration

We propose a new approximate value iteration method, namely near-value iteration (NVI), to solve continuous-state optimal stopping problems under partial observation, which in general cannot be solved analytically and also pose a great challenge to numerical solutions. NVI is motivated by the expression of the value function as the supremum over an uncountable set of linear functions in the belief state. After a smart manipulation of the operations in the updating equation for the value function, we reduce the set to only two functions at every time step, so as to achieve significant computational savings. NVI yields a value function approximation bounded by the tightest lower and upper bounds that can be achieved by existing algorithms in the same class, so the NVI approximation is closer to the true value function than at least one of these bounds. We demonstrate the effectiveness of our approach on an example of pricing American options under stochastic volatility.

[1]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[2]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[3]  Stéphane Villeneuve,et al.  Investment Timing Under Incomplete Information , 2003, Math. Oper. Res..

[4]  Michael C. Fu,et al.  Solving Continuous-State POMDPs via Density Projection , 2010, IEEE Transactions on Automatic Control.

[5]  Paul Glasserman,et al.  Monte Carlo Methods in Financial Engineering , 2003 .

[6]  Huyên Pham,et al.  Approximation by quantization of the filter process and applications to optimal stopping problems under partial observation , 2005, Monte Carlo Methods Appl..

[7]  M. Ludkovski A simulation approach to optimal stopping under partial information , 2009, 0902.2518.

[8]  I. Florescu,et al.  Stochastic Volatility: Option Pricing using a Multinomial Recombining Tree , 2005 .

[9]  Alex Brooks,et al.  A Monte Carlo Update for Parametric POMDPs , 2007, ISRR.

[10]  G. Mazziotto Approximations of the optimal stopping problem in partial observation , 1986, Journal of Applied Probability.

[11]  Guang-Hui Hsu,et al.  Optimal Stopping by Means of Point Process Observations with Applications in Reliability , 1993, Math. Oper. Res..

[12]  George E. Monahan,et al.  A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .

[13]  Dawn Hunter,et al.  A stochastic mesh method for pricing high-dimensional American options , 2004 .

[14]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.

[15]  P. Glasserman,et al.  A Sotchastic Mesh Method for Pricing High-Dimensional American Options , 2004 .

[16]  Pascal Poupart,et al.  Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..

[17]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[18]  Kurt Helmes,et al.  A Variational Inequality Sufficient Condition for Optimal Stopping with Application to an Optimal Stock Selling Problem , 2006, SIAM J. Control. Optim..

[19]  Anthony Brockwell,et al.  Sequential Monte Carlo Pricing of American-Style Options under Stochastic Volatility Models , 2009, 1010.1372.

[20]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .