A Generalized Minimax Q-Learning Algorithm for Two-Player Zero-Sum Stochastic Games
暂无分享,去创建一个
Shalabh Bhatnagar | Raghuram Bharadwaj Diddigi | Chandramouli Kamanchi | S. Bhatnagar | Chandramouli Kamanchi
[1] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[2] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[3] Shalabh Bhatnagar,et al. Successive Over Relaxation Q-Learning , 2019, IEEE Control. Syst. Lett..
[4] Manuela M. Veloso,et al. Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.
[5] Dieter Reetz,et al. Solution of a Markovian decision problem by successive overrelaxation , 1973, Z. Oper. Research.
[6] J. Filar,et al. Competitive Markov Decision Processes , 1996 .
[7] Harold J. Kushner,et al. wchastic. approximation methods for constrained and unconstrained systems , 1978 .
[8] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[9] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..
[10] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[11] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.