Convergent multiple-timescales reinforcement learning algorithms in normal form games
暂无分享,去创建一个
[1] L. Shapley. SOME TOPICS IN TWO-PERSON GAMES , 1963 .
[2] S. Vajda. Some topics in two-person games , 1971 .
[3] J. Harsanyi. Games with randomly disturbed payoffs: A new rationale for mixed-strategy equilibrium points , 1973 .
[4] Harold J. Kushner,et al. wchastic. approximation methods for constrained and unconstrained systems , 1978 .
[5] V. Nollau. Kushner, H. J./Clark, D. S., Stochastic Approximation Methods for Constrained and Unconstrained Systems. (Applied Mathematical Sciences 26). Berlin‐Heidelberg‐New York, Springer‐Verlag 1978. X, 261 S., 4 Abb., DM 26,40. US $ 13.20 , 1980 .
[6] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[7] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[8] L. Baird,et al. A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING , 1990 .
[9] R. Pemantle,et al. Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations , 1990 .
[10] J. Jordan. Three Problems in Learning Mixed-Strategy Nash Equilibria , 1993 .
[11] David M. Kreps,et al. Learning Mixed Equilibria , 1993 .
[12] Christopher Jones,et al. Geometric singular perturbation theory , 1995 .
[13] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[14] E. J. Collins,et al. A general technique for computing evolutionarily stable strategies based on errors in decision-making. , 1997, Journal of theoretical biology.
[15] V. Borkar. Stochastic approximation with two time scales , 1997 .
[16] Tilman Börgers,et al. Learning Through Reinforcement and Replicator Dynamics , 1997 .
[17] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.
[18] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .
[19] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[20] E. Hopkins. A Note on Best Response Dynamics , 1999 .
[21] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[22] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[23] M. Benaïm. Dynamics of stochastic approximation algorithms , 1999 .
[24] M. Hirsch,et al. Mixed Equilibria and Dynamical Systems Arising from Fictitious Play in Perturbed Games , 1999 .
[25] Eric van Damme,et al. Non-Cooperative Games , 2000 .
[26] Peter Stone,et al. Implicit Negotiation in Repeated Games , 2001, ATAL.
[27] Peter Stone,et al. Leading Best-Response Strategies in Repeated Games , 2001, International Joint Conference on Artificial Intelligence.
[28] Vivek S. Borkar,et al. Reinforcement Learning in Markovian Evolutionary Games , 2002, Adv. Complex Syst..
[29] Josef Hofbauer,et al. Learning in perturbed asymmetric games , 2005, Games Econ. Behav..
[30] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.