暂无分享,去创建一个
Quanyan Zhu | Tao Li | Guanze Peng | Quanyan Zhu | Tao Li | Guanze Peng
[1] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[2] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[3] Josef Hofbauer,et al. Stochastic Approximations and Differential Inclusions , 2005, SIAM J. Control. Optim..
[4] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .
[5] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[6] Peter L. Bartlett,et al. Blackwell Approachability and No-Regret Learning are Equivalent , 2010, COLT.
[7] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[8] Paulo Martins Engel,et al. Dealing with non-stationary environments using context detection , 2006, ICML.
[9] Erwan Lecarpentier,et al. Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning , 2019, NeurIPS.
[10] D. Leslie,et al. Asynchronous stochastic approximation with differential inclusions , 2011, 1112.2288.
[11] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .
[12] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[13] Josef Hofbauer,et al. Stochastic Approximations and Differential Inclusions, Part II: Applications , 2006, Math. Oper. Res..
[14] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[15] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[16] D. Blackwell. An analog of the minimax theorem for vector payoffs. , 1956 .
[17] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[18] Shie Mannor,et al. Markov Decision Processes with Arbitrary Reward Processes , 2008, Math. Oper. Res..
[19] Sergiu Hart,et al. Regret-based continuous-time dynamics , 2003, Games Econ. Behav..
[20] Ian A. Kash,et al. Combining No-regret and Q-learning , 2019, AAMAS.
[21] Quanyan Zhu,et al. On Convergence Rate of Adaptive Multiscale Value Function Approximation for Reinforcement Learning , 2019, 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP).
[22] Aleksandrs Slivkins,et al. Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..
[23] R Bellman,et al. On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.
[24] Michael L. Littman,et al. Cyclic Equilibria in Markov Games , 2005, NIPS.
[25] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[26] Vianney Perchet,et al. Approachability, Regret and Calibration; implications and equivalences , 2013, ArXiv.
[27] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..