论文信息 - Measure-Valued Differentiation for Stationary Markov Chains

Measure-Valued Differentiation for Stationary Markov Chains

We study general state-space Markov chains that depend on a parameter, say, . Sufficient conditions are established for the stationary performance of such a Markov chain to be differentiable with respect to . Specifically, we study the case of unbounded performance functions and thereby extend the result on weak differentiability of stationary distributions of Markov chains to unbounded mappings. First, a closed-form formula for the derivative of the stationary performance of a general state-space Markov chain is given using an operator-theoretic approach. In a second step, we translate the derivative formula into unbiased gradient estimators. Specifically, we establish phantom-type estimators and score function estimators. We illustrate our results with examples from queueing theory.

Arie Hordijk | Bernd Heidergott | Heinz Weisshaupt

[1] Bernd Heidergott,et al. Towards a (Max,+) Control Theory for Public Transportation Networks , 2001, Discret. Event Dyn. Syst..

[2] Arie Hordijk,et al. Single-run gradient estimation via measure-valued differentiation , 2004, IEEE Transactions on Automatic Control.

[3] F. Baccelli,et al. Analytic expansions of max-plus Lyapunov exponents , 2000 .

[4] Paul Glasserman,et al. Gradient Estimation Via Perturbation Analysis , 1990 .

[5] A. Hordijk,et al. On ergodicity and recurrence properties of a Markov chain by an application to an open jackson network , 1992, Advances in Applied Probability.

[6] Xi-Ren Cao,et al. A note on the relation between weak derivatives and perturbation realization , 2002, IEEE Trans. Autom. Control..

[7] L. S. Gurin. Optimisation in stochastic models , 1964 .

[8] Bernd Heidergott. A weak derivative approach to optimization of threshold parameters in a multicomponent maintenance system , 2001, Journal of Applied Probability.

[9] Georg Ch. Pflug. Gradient estimates for the performance of markov chains and discrete event processes , 1992, Ann. Oper. Res..

[10] P. Glasserman. Regenerative derivatives of regenerative sequences , 1993, Advances in Applied Probability.

[11] F. Vázquez-Abad,et al. Measure valued differentiation for stochastic processes : the finite horizon case , 2000 .

[12] Peter W. Glynn,et al. Gradient estimation for ratios , 1991, 1991 Winter Simulation Conference Proceedings..

[13] Arie Hordijk,et al. Blackwell optimality in the class of all policies in Markov decision chains with a Borel state space and unbounded rewards , 1999, Math. Methods Oper. Res..

[14] Pierre Brémaud,et al. On the pathwise computation of derivatives with respect to the rate of a point process: The phantom RPA method , 1992, Queueing Syst. Theory Appl..

[15] Arie Hordijk,et al. Derivatives of Markov Kernels and Their Jordan Decomposition , 2003 .

[16] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[17] A. Hordijk,et al. Taylor series expansions for stationary Markov chains , 2003, Advances in Applied Probability.

[18] Arie Hordijk,et al. Characterization and sufficient conditions for normed ergodicity of Markov chains , 2004, Advances in Applied Probability.

[19] H. Kushner,et al. Estimation of the derivative of a stationary measure with respect to a control parameter , 1992 .

[20] Vincent Hodgson,et al. The Single Server Queue. , 1972 .

[21] Paul Glasserman,et al. Stationary waiting time derivatives , 1992, Queueing Syst. Theory Appl..

[22] P. Glynn,et al. Likelihood ratio gradient estimation for stochastic recursions , 1995, Advances in Applied Probability.

[23] Pierre Brémaud,et al. Maximal coupling and rare perturbation sensitivity analysis , 1992, Queueing Syst. Theory Appl..

[24] Xi-Ren Cao,et al. The phantom customer and marked customer methods for optimization of closed queueing networks with blocking and general service times , 1983, SIGMETRICS '83.

[25] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[26] Rommert Dekker,et al. On the Relation Between Recurrence and Ergodicity Properties in Denumerable Markov Decision Chains , 1994, Math. Oper. Res..

[27] George Ch. Pflug,et al. Optimization of Stochastic Models , 1996 .

[28] Arie Hordijk,et al. Average, Sensitive and Blackwell Optimal Policies in Denumerable Markov Decision Chains with Unbounded Rewards , 1988, Math. Oper. Res..