Adaptive importance sampling for value function approximation in off-policy reinforcement learning
暂无分享,去创建一个
Masashi Sugiyama | Jan Peters | Hirotaka Hachiya | Takayuki Akiyama | Jan Peters | Masashi Sugiyama | H. Hachiya | Takayuki Akiyama | Hirotaka Hachiya
[1] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[2] Klaus-Robert Müller,et al. Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..
[3] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[4] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[5] J. Franklin,et al. The elements of statistical learning: data mining, inference and prediction , 2005 .
[6] Motoaki Kawanabe,et al. Trading Variance Reduction with Unbiasedness: The Regularized Subspace Information Criterion for Robust Model Selection in Kernel Regression , 2004, Neural Computation.
[7] M. Bugeja,et al. Non-linear swing-up and stabilizing control of an inverted pendulum system , 2003, The IEEE Region 8 EUROCON 2003. Computer as a Tool..
[8] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[9] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[10] Ralf Schoknecht,et al. Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.
[11] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .
[12] Robert Tibshirani,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.
[13] Christian R. Shelton,et al. Policy Improvement for POMDPs Using Normalized Importance Sampling , 2001, UAI.
[14] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[15] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[16] H. Shimodaira,et al. Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .
[17] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[18] Ronald L. Wasserstein,et al. Monte Carlo: Concepts, Algorithms, and Applications , 1997 .
[19] Kazuo Tanaka,et al. An approach to fuzzy control of nonlinear systems: stability and design issues , 1996, IEEE Trans. Fuzzy Syst..
[20] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[21] N. L. Johnson. Linear Statistical Inference and Its Applications , 1966 .