Finite-Time Analysis of Minimax Q-Learning for Two-Player Zero-Sum Markov Games: Switching System Approach
暂无分享,去创建一个
[1] Niao He,et al. A Discrete-Time Switching System Analysis of Q-Learning , 2021, SIAM J. Control. Optim..
[2] Siva Theja Maguluri,et al. A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants , 2021, ArXiv.
[3] Dongbin Zhao,et al. Online Minimax Q Network Learning for Two-Player Zero-Sum Markov Games , 2020, IEEE Transactions on Neural Networks and Learning Systems.
[4] Yuxin Chen,et al. Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction , 2020, IEEE Transactions on Information Theory.
[5] Niao He,et al. Periodic Q-Learning , 2020, L4DC.
[6] H. Khalil,et al. Nonlinear systems , 2020, Student Solution Manual for Differential Equations: Techniques, Theory, and Applications.
[7] Adam Wierman,et al. Finite-Time Analysis of Asynchronous Stochastic Approximation and Q-Learning , 2020, COLT.
[8] T. Başar,et al. Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.
[9] Shalabh Bhatnagar,et al. A Generalized Minimax Q-Learning Algorithm for Two-Player Zero-Sum Stochastic Games , 2019, IEEE Transactions on Automatic Control.
[10] Martin J. Wainwright,et al. Stochastic approximation with cone-contractive operators: Sharp 𝓁∞-bounds for Q-learning , 2019, ArXiv.
[11] R. Srikant,et al. Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning , 2019, COLT.
[12] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.
[13] Michael H. Bowling,et al. Actor-Critic Policy Optimization in Partially Observable Multiagent Environments , 2018, NeurIPS.
[14] Jalaj Bhandari,et al. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation , 2018, COLT.
[15] Olivier Pietquin,et al. Actor-Critic Fictitious Play in Simultaneous Move Multistage Games , 2018, AISTATS.
[16] Chi-Jen Lu,et al. Online Reinforcement Learning in Stochastic Games , 2017, NIPS.
[17] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[18] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[19] Shie Mannor,et al. Finite Sample Analyses for TD(0) With Function Approximation , 2017, AAAI.
[20] Bruno Scherrer,et al. On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games , 2016, AISTATS.
[21] David Silver,et al. Memory-based control with recurrent neural networks , 2015, ArXiv.
[22] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[23] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[24] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[25] Bruno Scherrer,et al. Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.
[26] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[27] Michael I. Jordan,et al. Trust Region Policy Optimization , 2015, ICML.
[28] R. Srikant,et al. Error bounds for constant step-size Q-learning , 2012, Syst. Control. Lett..
[29] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.
[30] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[31] Hai Lin,et al. Stability and Stabilizability of Switched Linear Systems: A Survey of Recent Results , 2009, IEEE Transactions on Automatic Control.
[32] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[33] Tao Wang,et al. Dual Representations for Dynamic Programming and Reinforcement Learning , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[34] Abhijit Gosavi,et al. Boundedness of iterates in Q-Learning , 2006, Syst. Control. Lett..
[35] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[36] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..
[37] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[38] Daniel Liberzon,et al. Switching in Systems and Control , 2003, Systems & Control: Foundations & Applications.
[39] Michail G. Lagoudakis,et al. Value Function Approximation in Zero-Sum Markov Games , 2002, UAI.
[40] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.
[41] Michael L. Littman,et al. Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.
[42] Michael H. Bowling,et al. Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.
[43] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[44] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.
[45] Csaba Szepesvári,et al. The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.
[46] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[47] Chi-Tsong Chen,et al. Linear System Theory and Design , 1995 .
[48] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[49] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[50] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[51] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[52] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[53] Donghwan Lee,et al. A Unified Switching System Perspective and Convergence Analysis of Q-Learning Algorithms , 2020, NeurIPS.
[54] Sean P. Meyn,et al. Zap Q-Learning , 2017, NIPS.
[55] Dimitri P. Bertsekas,et al. Neuro-Dynamic Programming , 2009, Encyclopedia of Optimization.
[56] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[57] Michael I. Jordan,et al. On the Convergence of Stochastic Iterative Dynamic Programming Algorithms , 1994, Neural Computation.