Using Convex Switching Techniques for Partially Observable Decision Processes

We present and examine a novel method for obtaining solutions to specific discrete-time optimal control problems. Our approach is based on linear state dynamics and convexity assumptions commonly satisfied in practical applications. We show that the important class of optimal switching problems under partial observation is covered by our methodology, and we exploit specific model features to achieve simple algorithmic form of a numerical solution.

[1]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[2]  D. Braziunas POMDP solution methods , 2003 .

[3]  George E. Monahan,et al.  A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .

[4]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[5]  Hao Zhang,et al.  Partially Observable Markov Decision Processes: A Geometric Technique and Analysis , 2010, Oper. Res..

[6]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[7]  Juri Hinz,et al.  Algorithms for Optimal Control of Stochastic Switching Systems , 2015 .

[8]  Guy Shani,et al.  Noname manuscript No. (will be inserted by the editor) A Survey of Point-Based POMDP Solvers , 2022 .

[9]  J. Carriére Valuation of the early-exercise price for options using simulations and nonparametric regression , 1996 .

[10]  I. Chades,et al.  When to stop managing or surveying cryptic threatened species , 2008, Proceedings of the National Academy of Sciences.

[11]  Paul Glasserman,et al.  Monte Carlo Methods in Financial Engineering , 2003 .

[12]  Juri Hinz,et al.  Optimal Stochastic Switching under Convexity Assumptions , 2014, SIAM J. Control. Optim..

[13]  Jianqing Fan,et al.  Local polynomial modelling and its applications , 1994 .

[14]  Andrew J. Schaefer,et al.  Alleviating the Patient's Price of Privacy Through a Partially Observable Waiting List , 2013, Manag. Sci..

[15]  M. Kohler,et al.  A DYNAMIC LOOK-AHEAD MONTE CARLO ALGORITHM FOR PRICING AMERICAN OPTIONS , 2006 .

[16]  John N. Tsitsiklis,et al.  Regression methods for pricing complex American-style options , 2001, IEEE Trans. Neural Networks.

[17]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[18]  Francis A. Longstaff,et al.  Valuing American Options by Simulation: A Simple Least-Squares Approach , 2001 .

[19]  Denis Belomestny,et al.  Regression Methods for Stochastic Control Problems and Their Convergence Analysis , 2009, SIAM J. Control. Optim..

[20]  John N. Tsitsiklis,et al.  Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..

[21]  Peter W. Glynn,et al.  Kernel-Based Reinforcement Learning in Average-Cost Problems: An Application to Optimal Portfolio Choice , 2000, NIPS.

[22]  Dirk Ormoneit,et al.  Kernel-Based Reinforcement Learning , 2017, Encyclopedia of Machine Learning and Data Mining.