Asynchronous stochastic approximations with asymptotically biased errors and deep multi-agent learning
暂无分享,去创建一个
[1] H. Robbins. A Stochastic Approximation Method , 1951 .
[2] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[3] Josef Hofbauer,et al. Stochastic Approximations and Differential Inclusions , 2005, SIAM J. Control. Optim..
[4] R. Bass,et al. Review: P. Billingsley, Convergence of probability measures , 1971 .
[5] Shalabh Bhatnagar,et al. The Borkar-Meyn theorem for asynchronous stochastic approximations , 2011, Syst. Control. Lett..
[6] Yuxi Li,et al. Deep Reinforcement Learning: An Overview , 2017, ArXiv.
[7] J. Aubin,et al. Differential inclusions set-valued maps and viability theory , 1984 .
[8] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[9] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[10] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[11] Josef Hofbauer,et al. Stochastic Approximations and Differential Inclusions, Part II: Applications , 2006, Math. Oper. Res..
[12] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.
[13] Shalabh Bhatnagar,et al. Conditions for Stability and Convergence of Set-Valued Stochastic Approximations: Applications to Approximate Value and Fixed point Iterations with Noise , 2017, ArXiv.
[14] D. Leslie,et al. Asynchronous stochastic approximation with differential inclusions , 2011, 1112.2288.
[15] Shalabh Bhatnagar,et al. Analysis of Gradient Descent Methods With Nondiminishing Bounded Errors , 2016, IEEE Transactions on Automatic Control.
[16] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.
[17] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[18] V. Borkar. Asynchronous Stochastic Approximations , 1998 .
[19] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[20] Michel Bena,et al. Asymptotic Pseudotrajectories and Chain Recurrent Flows, with Applications , 1996 .
[21] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[22] Vivek S. Borkar,et al. Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms , 1997, SIAM J. Control. Optim..
[23] M. Benaïm. A Dynamical System Approach to Stochastic Approximations , 1996 .
[24] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[25] W. Hachem,et al. Constant step stochastic approximations involving differential inclusions: stability, long-run convergence and applications , 2016, Stochastics.
[26] Shalabh Bhatnagar,et al. A Generalization of the Borkar-Meyn Theorem for Stochastic Recursive Inclusions , 2015, Math. Oper. Res..