暂无分享,去创建一个
[1] F. Downton. Stochastic Approximation , 1969, Nature.
[2] M. T. Wasan. Stochastic Approximation , 1969 .
[3] E. C. Capen,et al. Competitive Bidding in High-Risk Situations , 1971 .
[4] R. Thaler. Anomalies: The Winner's Curse , 1988 .
[5] L. C. Baird,et al. Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).
[6] George H. John. When the Best Move Isn't Optimal: Q-learning with Exploration , 1994, AAAI.
[7] Mark D. Pendrith. On Reinforcement Learning of Control Actions in Noisy and Non-Markovian Domains , 1994 .
[8] A. Harry Klopf,et al. Advantage Updating Applied to a Differrential Game , 1994, NIPS.
[9] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[10] Mark D. Pendrith,et al. Estimator Variance in Reinforcement Learning: Theoretical Problems and Practical Solutions , 1997 .
[11] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[12] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[13] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[14] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[15] Matthias W. Seeger,et al. PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..
[16] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[17] David A. McAllester. Simplified PAC-Bayesian Margin Bounds , 2003, COLT.
[18] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[19] E. Steen. Rational Overoptimism (and Other Biases) , 2004 .
[20] M. Tribus,et al. Probability theory: the logic of science , 2003 .
[21] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[22] J. Langford. Tutorial on Practical Prediction Theory for Classification , 2005, J. Mach. Learn. Res..
[23] Robert L. Winkler,et al. The Optimizer's Curse: Skepticism and Postdecision Surprise in Decision Analysis , 2006, Manag. Sci..
[24] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.
[25] A. Moreno,et al. Noisy reinforcements in reinforcement learning: some case studies based on gridworlds , 2006 .
[26] Warren B. Powell,et al. Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics) , 2007 .
[27] Panos M. Pardalos,et al. Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..
[28] Emanuel Todorov,et al. Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.
[29] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[30] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[31] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[32] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[33] Joelle Pineau,et al. PAC-Bayesian Model Selection for Reinforcement Learning , 2010, NIPS.
[34] Marc Toussaint,et al. Approximate Inference and Stochastic Optimal Control , 2010, ArXiv.
[35] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[36] John Shawe-Taylor,et al. PAC-Bayesian Analysis of Contextual Bandits , 2011, NIPS.
[37] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.
[38] Doina Precup,et al. An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.
[39] Warren B. Powell,et al. “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.
[40] Günther Palm,et al. Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax , 2011, KI.
[41] Naftali Tishby,et al. Trading Value and Information in MDPs , 2012 .
[42] Warren B. Powell,et al. An Intelligent Battery Controller Using Bias-Corrected Q-learning , 2012, AAAI.
[43] Hilbert J. Kappen,et al. Dynamic policy programming , 2010, J. Mach. Learn. Res..
[44] Vicenç Gómez,et al. Optimal control as a graphical model inference problem , 2009, Machine Learning.
[45] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[46] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[47] Marc G. Bellemare,et al. Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.