Measure-Valued Differentiation for Stationary Markov Chains

We study general state-space Markov chains that depend on a parameter, say, . Sufficient conditions are established for the stationary performance of such a Markov chain to be differentiable with respect to . Specifically, we study the case of unbounded performance functions and thereby extend the result on weak differentiability of stationary distributions of Markov chains to unbounded mappings. First, a closed-form formula for the derivative of the stationary performance of a general state-space Markov chain is given using an operator-theoretic approach. In a second step, we translate the derivative formula into unbiased gradient estimators. Specifically, we establish phantom-type estimators and score function estimators. We illustrate our results with examples from queueing theory.

[1]  Bernd Heidergott,et al.  Towards a (Max,+) Control Theory for Public Transportation Networks , 2001, Discret. Event Dyn. Syst..

[2]  Arie Hordijk,et al.  Single-run gradient estimation via measure-valued differentiation , 2004, IEEE Transactions on Automatic Control.

[3]  F. Baccelli,et al.  Analytic expansions of max-plus Lyapunov exponents , 2000 .

[4]  Paul Glasserman,et al.  Gradient Estimation Via Perturbation Analysis , 1990 .

[5]  A. Hordijk,et al.  On ergodicity and recurrence properties of a Markov chain by an application to an open jackson network , 1992, Advances in Applied Probability.

[6]  Xi-Ren Cao,et al.  A note on the relation between weak derivatives and perturbation realization , 2002, IEEE Trans. Autom. Control..

[7]  L. S. Gurin Optimisation in stochastic models , 1964 .

[8]  Bernd Heidergott A weak derivative approach to optimization of threshold parameters in a multicomponent maintenance system , 2001, Journal of Applied Probability.

[9]  Georg Ch. Pflug Gradient estimates for the performance of markov chains and discrete event processes , 1992, Ann. Oper. Res..

[10]  P. Glasserman Regenerative derivatives of regenerative sequences , 1993, Advances in Applied Probability.

[11]  F. Vázquez-Abad,et al.  Measure valued differentiation for stochastic processes : the finite horizon case , 2000 .

[12]  Peter W. Glynn,et al.  Gradient estimation for ratios , 1991, 1991 Winter Simulation Conference Proceedings..

[13]  Arie Hordijk,et al.  Blackwell optimality in the class of all policies in Markov decision chains with a Borel state space and unbounded rewards , 1999, Math. Methods Oper. Res..

[14]  Pierre Brémaud,et al.  On the pathwise computation of derivatives with respect to the rate of a point process: The phantom RPA method , 1992, Queueing Syst. Theory Appl..

[15]  Arie Hordijk,et al.  Derivatives of Markov Kernels and Their Jordan Decomposition , 2003 .

[16]  Peter W. Glynn,et al.  Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[17]  A. Hordijk,et al.  Taylor series expansions for stationary Markov chains , 2003, Advances in Applied Probability.

[18]  Arie Hordijk,et al.  Characterization and sufficient conditions for normed ergodicity of Markov chains , 2004, Advances in Applied Probability.

[19]  H. Kushner,et al.  Estimation of the derivative of a stationary measure with respect to a control parameter , 1992 .

[20]  Vincent Hodgson,et al.  The Single Server Queue. , 1972 .

[21]  Paul Glasserman,et al.  Stationary waiting time derivatives , 1992, Queueing Syst. Theory Appl..

[22]  P. Glynn,et al.  Likelihood ratio gradient estimation for stochastic recursions , 1995, Advances in Applied Probability.

[23]  Pierre Brémaud,et al.  Maximal coupling and rare perturbation sensitivity analysis , 1992, Queueing Syst. Theory Appl..

[24]  Xi-Ren Cao,et al.  The phantom customer and marked customer methods for optimization of closed queueing networks with blocking and general service times , 1983, SIGMETRICS '83.

[25]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[26]  Rommert Dekker,et al.  On the Relation Between Recurrence and Ergodicity Properties in Denumerable Markov Decision Chains , 1994, Math. Oper. Res..

[27]  George Ch. Pflug,et al.  Optimization of Stochastic Models , 1996 .

[28]  Arie Hordijk,et al.  Average, Sensitive and Blackwell Optimal Policies in Denumerable Markov Decision Chains with Unbounded Rewards , 1988, Math. Oper. Res..