Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features

We study three existing issues associated with optimization of long-run average rewards of time-nonhomogeneous Markov processes in continuous time with continuous state spaces: 1) the underselectivity, i.e., the optimal policies do not depend on their actions in any finite time period; 2) its related issue, the bias optimality, i.e., policies that optimize both long-run average and transient total rewards, and 3) the effects of a nonsmooth point of a value function on performance optimization. These issues require considerations of the performance in the entire period with an infinite horizon, and therefore are not easily solvable by dynamic programming, which works backwards in time and takes a local view at a particular time instant. In this paper, we take a different approach called the relative optimization theory, which is based on a direct comparison of the performance measures of any two policies. We derive tight necessary and sufficient optimality conditions that take the underselectivity into consideration; we derive bias optimality conditions for both long-run average and transient rewards; and we show that the effect of a wide class of nonsmooth points, called semismooth points, of a value function on the long-run average performance is zero and can be ignored.

[1]  M. Bartlett,et al.  Weak ergodicity in non-homogeneous Markov chains , 1958, Mathematical Proceedings of the Cambridge Philosophical Society.

[2]  Hiroshi Tanaka Stochastic differential equations with reflecting boundary condition in convex regions , 1979 .

[3]  G. Kallianpur Stochastic differential equations and diffusion processes , 1981 .

[4]  L. Rogers Stochastic differential equations and diffusion processes: Nobuyuki Ikeda and Shinzo Watanabe North-Holland, Amsterdam, 1981, xiv + 464 pages, Dfl.175.00 , 1982 .

[5]  P. Lions,et al.  Viscosity solutions of Hamilton-Jacobi equations , 1983 .

[6]  Ioannis Karatzas,et al.  Brownian Motion and Stochastic Calculus , 1987 .

[7]  B. Øksendal Stochastic differential equations : an introduction with applications , 1987 .

[8]  P. Protter Stochastic Differential Equations , 1990 .

[9]  W. Fleming,et al.  Controlled Markov processes and viscosity solutions , 1992 .

[10]  Robert L. Smith,et al.  Optimal average value convergence in nonhomogeneous Markov decision processes Yunsun Park, James C. Bean and Robert L. Smith. , 1993 .

[11]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[12]  F. Klebaner Introduction To Stochastic Calculus With Applications , 1999 .

[13]  X. Zhou,et al.  Stochastic Controls: Hamiltonian Systems and HJB Equations , 1999 .

[14]  G. Peskir A Change-of-Variable Formula with Local Time on Curves , 2005 .

[15]  Xi-Ren Cao,et al.  Stochastic learning and optimization - A sensitivity-based approach , 2007, Annual Reviews in Control.

[16]  N. Harris STOCHASTIC CONTROL , 2011 .

[17]  Tao Lu,et al.  Stochastic control via direct comparison , 2011, Discret. Event Dyn. Syst..

[18]  Yuan-Hua Ni,et al.  Policy Iteration Algorithm for Singular Controlled Diffusion Processes , 2013, SIAM J. Control. Optim..

[19]  Li Qiu,et al.  Partial-Information State-Based Optimization of Partially Observable Markov Decision Processes and the Separation Principle , 2014, IEEE Transactions on Automatic Control.

[20]  Xi-Ren Cao,et al.  Optimization of Average Rewards of Time Nonhomogeneous Markov Chains , 2015, IEEE Transactions on Automatic Control.

[21]  B. Kawohl,et al.  Jump discontinuous viscosity solutions to second order degenerate elliptic equations , 2015 .

[22]  Anja Walter,et al.  Introduction To Stochastic Calculus With Applications , 2016 .

[23]  S. Aachen Stochastic Differential Equations An Introduction With Applications , 2016 .

[24]  Xi-Ren Cao,et al.  State Classification of Time-Nonhomogeneous Markov Chains and Average Reward Optimization of Multi-Chains , 2016, IEEE Transactions on Automatic Control.

[25]  Xi-Ren Cao,et al.  Relative Time and Stochastic Control With Non-Smooth Features , 2017, IEEE Transactions on Automatic Control.

[26]  Xi-Ren Cao,et al.  SENSITIVITY ANALYSIS OF NONLINEAR BEHAVIOR WITH DISTORTED PROBABILITY , 2013 .

[27]  B. Øksendal,et al.  Applied Stochastic Control of Jump Diffusions , 2004, Universitext.