An Algorithm for Making Regime-Changing Markov Decisions

In industrial applications, the processes of optimal sequential decision making are naturally formulated and optimized within a standard setting of Markov decision theory. In practice, however, decisions must be made under incomplete and uncertain information about parameters and transition probabilities. This situation occurs when a system may suffer a regime switch changing not only the transition probabilities but also the control costs. After such an event, the effect of the actions may turn to the opposite, meaning that all strategies must be revised. Due to practical importance of this problem, a variety of methods has been suggested, ranging from incorporating regime switches into Markov dynamics to numerous concepts addressing model uncertainty. In this work, we suggest a pragmatic and practical approach using a natural re-formulation of this problem as a so-called convex switching system, we make efficient numerical algorithms applicable.

[1]  Symeon Papavassiliou,et al.  Uplink Power Control in QoS-aware Multi-Service CDMA Wireless Networks , 2009, J. Commun..

[2]  John N. Tsitsiklis,et al.  Regression methods for pricing complex American-style options , 2001, IEEE Trans. Neural Networks.

[3]  Peter W. Glynn,et al.  Kernel-Based Reinforcement Learning in Average-Cost Problems: An Application to Optimal Portfolio Choice , 2000, NIPS.

[4]  Juri Hinz,et al.  rcss: R package for optimal convex stochastic switching , 2018, R J..

[5]  John N. Tsitsiklis,et al.  Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..

[6]  Jeremy Yee,et al.  Algorithmic Solutions for Optimal Switching Problems , 2016, 2016 Second International Symposium on Stochastic Models in Reliability Engineering, Life Science and Operations Management (SMRLO).

[7]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[8]  Juri Hinz,et al.  Algorithms for Optimal Control of Stochastic Switching Systems , 2015 .

[9]  Francis A. Longstaff,et al.  Valuing American Options by Simulation: A Simple Least-Squares Approach , 2001 .

[10]  Guy Shani,et al.  Noname manuscript No. (will be inserted by the editor) A Survey of Point-Based POMDP Solvers , 2022 .

[11]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[12]  M. Kohler,et al.  A dynamic look-ahead Monte Carlo algorithm for pricing Bermudan options , 2007, 0710.3640.

[13]  Xin-Lin Huang,et al.  Machine Learning for Communication Performance Enhancement , 2018, Wirel. Commun. Mob. Comput..

[14]  Denis Belomestny,et al.  Regression Methods for Stochastic Control Problems and Their Convergence Analysis , 2009, SIAM J. Control. Optim..

[15]  Juri Hinz,et al.  Optimal forward trading and battery control under renewable electricity generation , 2017, Journal of Banking & Finance.

[16]  Juri Hinz,et al.  Optimal Stochastic Switching under Convexity Assumptions , 2014, SIAM J. Control. Optim..

[17]  J. Carriére Valuation of the early-exercise price for options using simulations and nonparametric regression , 1996 .

[18]  Tanya Tarnopolskaya,et al.  Efficient algorithms of pathwise dynamic programming for decision optimization in mining operations , 2018, Ann. Oper. Res..

[19]  Juri Hinz,et al.  Stochastic switching for partially observable dynamics and optimal asset allocation , 2017, Int. J. Control.

[20]  William S. Lovejoy,et al.  Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..