Log-normality and Skewness of Estimated State/Action Values in Reinforcement Learning
暂无分享,去创建一个
[1] Norman C. Beaulieu,et al. An optimal lognormal approximation to lognormal sum distributions , 2004, IEEE Transactions on Vehicular Technology.
[2] Nan Jiang,et al. The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.
[3] Patrick M. Pilarski,et al. True Online Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[4] G. Oehlert. A note on the delta method , 1992 .
[5] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[6] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[7] Harold Hotelling,et al. The Limits of a Measure of Skewness , 1932 .
[8] P. Dayan. The Convergence of TD(λ) for General λ , 2004, Machine Learning.
[9] Dimitri P. Bertsekas,et al. Q-learning and enhanced policy iteration in discounted dynamic programming , 2010, CDC.
[10] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[11] Michael L. Littman,et al. Potential-based Shaping in Model-based Reinforcement Learning , 2008, AAAI.
[12] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[13] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[14] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[15] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[16] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[17] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[18] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[19] Paul Wagner,et al. Policy oscillation is overshooting , 2014, Neural Networks.
[20] David P. Doane,et al. Measuring Skewness: A Forgotten Statistic? , 2011 .
[21] Marc G. Bellemare,et al. Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.
[22] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[23] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[24] Marcello Restelli,et al. Estimating the Maximum Expected Value in Continuous Reinforcement Learning Problems , 2017, AAAI.
[25] Dimitri P. Bertsekas,et al. Q-learning and enhanced policy iteration in discounted dynamic programming , 2010, 49th IEEE Conference on Decision and Control (CDC).
[26] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[27] Warren B. Powell,et al. Bias-corrected Q-learning to control max-operator bias in Q-learning , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[28] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.
[29] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[30] Xin Yao,et al. Increasingly Cautious Optimism for Practical PAC-MDP Exploration , 2015, IJCAI.
[31] Csaba Szepesvári,et al. The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.