Markov decision processes with observation costs

We present a framework for a controlled Markov chain where the state of the chain is only given at chosen observation times and of a cost. Optimal strategies therefore involve the choice of observation times as well as the subsequent control values. We show that the corresponding value function satisfies a dynamic programming principle, which leads to a system of quasivariational inequalities (QVIs). Next, we give an extension where the model parameters are not known a priori but are inferred from the costly observations by Bayesian updates. We then prove a comparison principle for a larger class of QVIs, which implies uniqueness of solutions to our proposed problem. We utilise penalty methods to obtain arbitrarily accurate solutions. Finally, we perform numerical experiments on three applications which illustrate our framework.

[1]  Robert F. Anderson,et al.  Optimal Inspections in a Stochastic Control Problem with Costly Observations, II , 1977, Math. Oper. Res..

[2]  J. Miguel Villas-Boas,et al.  Search for Information on Multiple Products , 2015, Manag. Sci..

[3]  Christof Schütte,et al.  Optimal Treatment Strategies in the Context of ‘Treatment for Prevention’ against HIV-1 in Resource-Poor Settings , 2015, PLoS computational biology.

[4]  Hidekazu Yoshioka,et al.  Cost‐efficient monitoring of continuous‐time stochastic processes based on discrete observations , 2020 .

[5]  W. Fleming,et al.  Optimal Control for Partially Observed Diffusions , 1982 .

[6]  H. Pham On some recent aspects of stochastic control and their applications , 2005, math/0509711.

[7]  Christoph Reisinger,et al.  A Penalty Scheme for Monotone Systems with Interconnected Obstacles: Convergence and Error Estimates , 2018, SIAM J. Numer. Anal..

[8]  Christoph Reisinger,et al.  A Penalty Method for the Numerical Solution of Hamilton-Jacobi-Bellman (HJB) Equations in Finance , 2011, SIAM J. Numer. Anal..

[9]  Erik Ekström,et al.  Sequential testing of a Wiener process with costly observations , 2018 .

[10]  V. Borkar Controlled diffusion processes , 2005, math/0511077.

[11]  Hasnaa Zidani,et al.  Some Convergence Results for Howard's Algorithm , 2009, SIAM J. Numer. Anal..

[12]  Darrell Duffie,et al.  Transactions costs and portfolio choice in a discrete-continuous-time setting , 1990 .

[13]  S. Winkelmann Markov Decision Processes with Information Costs , 2013 .

[14]  Christoph Reisinger,et al.  Penalty Methods for the Solution of Discrete HJB Equations - Continuous Control and Obstacle Problems , 2012, SIAM J. Numer. Anal..

[15]  Avner Friedman,et al.  Optimal Inspections in a Stochastic Control Problem with Costly Observations , 1977, Math. Oper. Res..

[16]  Christof Schütte,et al.  Markov Control Processes with Rare State Observation: Theory and Application to Treatment Scheduling in HIV-1 , 2014 .

[17]  Janice C. Eberly,et al.  Optimal Inattention to the Stock Market , 2007 .

[18]  Rory Coles,et al.  Active Measure Reinforcement Learning for Observation Cost Minimization , 2020, Canadian Conference on AI.

[19]  Hidekazu Yoshioka,et al.  Analysis and computation of an optimality equation arising in an impulse control problem with discrete and costly observations , 2020, J. Comput. Appl. Math..

[20]  Pravin Varaiya,et al.  Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[21]  Hidekazu Yoshioka,et al.  Analysis and computation of a discrete costly observation model for growth estimation and management of biological resources , 2020, Comput. Math. Appl..

[22]  Hidekazu Yoshioka,et al.  A hybrid stochastic river environmental restoration modeling with discrete and costly observations , 2020, Optimal Control Applications and Methods.

[23]  H. Kushner Numerical Methods for Stochastic Control Problems in Continuous Time , 2000 .

[24]  Janice C. Eberly,et al.  Optimal Inattention to the Stock Market with Information Costs and Transactions Costs , 2009 .

[25]  Huyen Pham,et al.  Continuous-time stochastic control and optimization with financial applications / Huyen Pham , 2009 .

[26]  Switching cost models as hypothesis tests , 2018, Economics Letters.