A mean field approach for optimization in discrete time

This paper investigates the limit behavior of Markov decision processes made of independent objects evolving in a common environment, when the number of objects (N) goes to infinity. In the finite horizon case, we show that when the number of objects becomes large, the optimal cost of the system converges to the optimal cost of a discrete time system that is deterministic. Convergence also holds for optimal policies. We further provide bounds on the speed of convergence by proving second order results that resemble central limits theorems for the cost and the state of the Markov decision process, with explicit formulas for the limit. These bounds (of order $1/\sqrt{N}$) are proven to be tight in a numerical example. One can even go further and get convergence of order $\sqrt{\log N}/N$ to a stochastic system made of the mean field limit and a Gaussian term. Our framework is applied to a brokering problem in grid computing. Several simulations with growing numbers of processors are reported. They compare the performance of the optimal policy of the limit system used in the finite case with classical policies by measuring its asymptotic gain. Several extensions are also discussed. In particular, for infinite horizon cases with discounted costs, we show that first order limits hold and that second order results also hold as long as the discount factor is small enough. As for infinite horizon cases with non-discounted costs, examples show that even the first order limits may not hold.

[1]  John N. Tsitsiklis,et al.  The Complexity of Optimal Queuing Network Control , 1999, Math. Oper. Res..

[2]  R. Durrett Probability: Theory and Examples , 1993 .

[3]  Bruno Gaujal,et al.  Grid Brokering for Batch Allocation Using Indexes , 2007, NET-COOP.

[4]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[5]  Dorit S. Hochbaum,et al.  Approximation Algorithms for NP-Hard Problems , 1996 .

[6]  J. Gani A celebration of applied probability , 1990 .

[7]  Tomasz Rolski,et al.  Comparison theorems for queues with dependent inter-arrival times , 1984 .

[8]  C. Graham Chaoticity on path space for a queueing network with selection of the shortest queue among several , 2000, Journal of Applied Probability.

[9]  Bruno Gaujal,et al.  Brokering strategies in computational grids using stochastic prediction models , 2007, Parallel Comput..

[10]  Jean-Yves Le Boudec,et al.  A Generic Mean Field Convergence Result for Systems of Interacting Objects , 2007, Fourth International Conference on the Quantitative Evaluation of Systems (QEST 2007).

[11]  Bruno Gaujal,et al.  Mean Field for Markov Decision Processes: From Discrete to Continuous Optimization , 2010, IEEE Transactions on Automatic Control.

[12]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[13]  Bruno Gaujal,et al.  A mean field approach for optimization in particle systems and applications , 2009, VALUETOOLS.

[14]  R. Weber,et al.  On an index policy for restless bandits , 1990, Journal of Applied Probability.

[15]  John N. Tsitsiklis,et al.  The complexity of optimal queueing network control , 1994, Proceedings of IEEE 9th Annual Conference on Structure in Complexity Theory.

[16]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .

[17]  Bruno Gaujal,et al.  A Mean Field Approach for Optimization in Particles Systems and Applications , 2009 .

[18]  Alexandre Proutière,et al.  A particle system in interaction with a rapidly varying environment: Mean field limits and applications , 2010, Networks Heterog. Media.

[19]  Venkat Anantharam,et al.  Optimal control of interacting particle systems , 2009 .

[20]  J. Schwartz Nonlinear Functional Analysis , 1969 .

[21]  T. Kurtz Strong approximation theorems for density dependent Markov chains , 1978 .

[22]  Miklós Telek,et al.  Analysis of Large Scale Interacting Systems by Mean Field Method , 2008, 2008 Fifth International Conference on Quantitative Evaluation of Systems.

[23]  Isi Mitrani,et al.  Optimal and heuristic policies for dynamic server allocation , 2005, J. Parallel Distributed Comput..

[24]  M. Benaïm,et al.  A class of mean field interaction models for computer and communication systems , 2008, 2008 6th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks and Workshops.

[25]  Jean-Yves Le Boudec,et al.  A class of mean field interaction models for computer and communication systems , 2008, Perform. Evaluation.

[26]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .