论文信息 - Distributed asynchronous policy iteration in dynamic programming

Distributed asynchronous policy iteration in dynamic programming

We consider the distributed solution of dynamic programming (DP) problems by policy iteration. We envision a network of processors, each updating asynchronously a local policy and a local cost function, defined on a portion of the state space. The computed values are communicated asynchronously between processors and are used to perform the local policy and cost updates. The natural algorithm of this type can fail even under favorable circumstances, as shown by Williams and Baird [WiB93]. We propose an alternative and almost as simple algorithm, which converges to the optimum under the most general conditions, including asynchronous updating by multiple processors using outdated local cost functions of other processors.

Dimitri P. Bertsekas | Huizhen Yu | D. Bertsekas | Hu Yu

[1] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .

[2] D. Bertsekas. The auction algorithm: A distributed relaxation method for the assignment problem , 1988 .

[3] F. Robert. Contraction en norme vectorielle: Convergence d'iterations chaotiques pour des equations non linéaires de point fixe à plusieurs variables , 1976 .

[4] M. Tarazi. Some convergence results for asynchronous algorithms , 1982 .

[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[7] D. Bertsekas,et al. Partially asynchronous, parallel algorithms for network flow and other problems , 1990 .

[8] Dimitri P. Bertsekas,et al. Dual coordinate step methods for linear network flow problems , 1988, Math. Program..

[9] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .

[10] Didier El Baz,et al. Asynchronous Iterative Algorithms with Flexible Communication for Nonlinear Network Flow Problems , 1996, J. Parallel Distributed Comput..

[11] V. Borkar. Asynchronous Stochastic Approximations , 1998 .

[12] Dimitri P. Bertsekas,et al. Q-learning and enhanced policy iteration in discounted dynamic programming , 2010, CDC.

[13] Ronald J. Williams,et al. Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-Cr , 1993 .

[14] John N. Tsitsiklis,et al. On the stability of asynchronous iterative processes , 1986, 1986 25th IEEE Conference on Decision and Control.

[15] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[16] Dimitri P. Bertsekas,et al. Q-learning and enhanced policy iteration in discounted dynamic programming , 2010, 49th IEEE Conference on Decision and Control (CDC).

[17] Dimitri Bertsekas,et al. Distributed dynamic programming , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[18] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[19] Dimitri P. Bertsekas,et al. Distributed asynchronous computation of fixed points , 1983, Math. Program..

[20] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.

[21] D. Bertsekas,et al. Distributed asynchronous relaxation methods for convex network flow problems , 1987 .

[22] J. C. Miellou,et al. Algorithmes de relaxation chaotique à retards , 1975 .

[23] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..

[24] D. Bertsekas. Monotone Mappings with Application in Dynamic Programming , 1977 .

[25] J. Walrand,et al. Distributed Dynamic Programming , 2022 .

[26] Gérard M. Baudet,et al. Asynchronous Iterative Methods for Multiprocessors , 1978, JACM.

[27] Patrizia Beraldi,et al. A Parallel Asynchronous Implementation of the e-Relaxation Method for the Linear Minimum Cost Flow Problem , 1997, Parallel Comput..

[28] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[29] Dimitri P. Bertsekas,et al. Parallel synchronous and asynchronous implementations of the auction algorithm , 1991, Parallel Comput..