暂无分享,去创建一个
[1] Humberto Bustince,et al. A Practical Guide to Averaging Functions , 2015, Studies in Fuzziness and Soft Computing.
[2] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.
[3] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[4] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.
[5] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[6] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[8] George H. John. When the Best Move Isn't Optimal: Q-learning with Exploration , 1994, AAAI.
[9] Doina Precup,et al. Algorithms for multi-armed bandit problems , 2014, ArXiv.
[10] Chris L. Baker,et al. Goal Inference as Inverse Planning , 2007 .
[11] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[12] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[13] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[14] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[15] Geoffrey J. Gordon. Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.
[16] D. Sofge. THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .
[17] Michael L. Littman,et al. Apprenticeship Learning About Multiple Intentions , 2011, ICML.
[18] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[19] D. Stahl,et al. Experimental evidence on players' models of other players , 1994 .
[20] Doina Precup,et al. A Convergent Form of Approximate Policy Iteration , 2002, NIPS.
[21] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[22] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[23] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .
[24] D. Anderson,et al. Algorithms for minimization without derivatives , 1974 .
[25] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[26] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[27] Kevin Leyton-Brown,et al. Beyond equilibrium: predicting human behaviour in normal form games , 2010, AAAI.
[28] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[29] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[30] Csaba Szepesvári,et al. Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.