A probabilistic analysis of bias optimality in unichain Markov decision processes

Focuses on bias optimality in unichain, finite state, and action-space Markov decision processes. Using relative value functions, we present methods for evaluating optimal bias, this leads to a probabilistic analysis which transforms the original reward problem into a minimum average cost problem. The result is an explanation of how and why bias implicitly discounts future rewards.

[1]  O. Hernández-Lerma,et al.  Further topics on discrete-time Markov control processes , 1999 .

[2]  M. Puterman,et al.  Bias optimality in controlled queueing systems , 1998 .

[3]  O. Hernández-Lerma,et al.  Policy Iteration for Average Cost Markov Control Processes on Borel Spaces , 1997 .

[4]  Rutherford Aris,et al.  Discrete Dynamic Programming , 1965, The Mathematical Gazette.

[5]  A. F. Veinott Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .

[6]  C. Derman,et al.  A SOLUTION TO A COUNTABLE SYSTEM OF EQUATIONS ARISING IN MARKOVIAN DECISION PROCESSES. , 1966 .

[7]  O. Hernández-Lerma,et al.  Discrete-time Markov control processes , 1999 .

[8]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[9]  Martin L. Puterman,et al.  A note on bias optimality in controlled queueing systems , 2000, Journal of Applied Probability.

[10]  Steven A. Lippman,et al.  Applying a New Device in the Optimization of Exponential Queuing Systems , 1975, Oper. Res..

[11]  W. Fleming Book Review: Discrete-time Markov control processes: Basic optimality criteria , 1997 .

[12]  Elke Mann,et al.  Optimality equations and sensitive optimality in bounded Markov decision processes 1 , 1985 .

[13]  Eric V. Denardo,et al.  Computing a Bias-Optimal Policy in a Discrete-Time Markov Decision Problem , 1970, Oper. Res..

[14]  Hayriye Ayhan,et al.  BIAS OPTIMALITY IN A QUEUE WITH ADMISSION CONTROL , 1999, Probability in the Engineering and Informational Sciences.

[15]  J. Wendelberger Adventures in Stochastic Processes , 1993 .

[16]  M. K. Ghosh,et al.  Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .

[17]  A. Shwartz,et al.  ON THE POISSON EQUATION FOR MARKOV CHAINS : EXISTENCE OF SOLUTIONS AND PARAMETER DEPENDENCEBY PROBABILISTIC , 1994 .

[18]  Sean P. Meyn The policy iteration algorithm for average reward Markov decision processes with general state space , 1997, IEEE Trans. Autom. Control..

[19]  Arie Hordijk,et al.  Markov Decision Chains , 1996 .

[20]  Paul J. Schweitzer,et al.  The Functional Equations of Undiscounted Markov Renewal Programming , 1971, Math. Oper. Res..