Nonparametric Return Distribution Approximation for Reinforcement Learning
暂无分享,去创建一个
Masashi Sugiyama | Hisashi Kashima | Toshiyuki Tanaka | Tetsuro Morimura | Hirotaka Hachiya | Masashi Sugiyama | H. Kashima | Tetsuro Morimura | Toshiyuki TANAKA | H. Hachiya | Hirotaka Hachiya
[1] A. Kolmogoroff. Confidence Limits for an Unknown Distribution Function , 1941 .
[2] W. Feller. On the Kolmogorov–Smirnov Limit Theorems for Empirical Distributions , 1948 .
[3] Washington Hilton. NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE , 1983 .
[4] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[5] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[6] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[7] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[8] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[9] R. Rockafellar,et al. Optimization of conditional value-at risk , 2000 .
[10] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[11] Makoto Sato,et al. TD algorithm for the variance of return and mean-variance reinforcement learning , 2001 .
[12] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[13] Timothy J. Robinson,et al. Sequential Monte Carlo Methods in Practice , 2003 .
[14] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[15] A. Moore,et al. Learning decisions: robustness, uncertainty, and approximation , 2004 .
[16] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[17] Fritz Wysotzki,et al. Risk-Sensitive Reinforcement Learning Applied to Control under Constraints , 2005, J. Artif. Intell. Res..
[18] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[19] Mohammad Ghavamzadeh,et al. Bayesian actor-critic algorithms , 2007, ICML '07.
[20] Hisashi Kashima. Risk-Sensitive Learning via Minimization of Empirical Conditional Value-at-Risk , 2007, IEICE Trans. Inf. Syst..
[21] Louis Wehenkel,et al. Risk-aware decision making and dynamic programming , 2008 .
[22] Masashi Sugiyama,et al. Least absolute policy iteration for robust value function approximation , 2009, 2009 IEEE International Conference on Robotics and Automation.
[23] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[24] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.