Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning
暂无分享,去创建一个
[1] Masashi Sugiyama,et al. Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error , 2006, J. Mach. Learn. Res..
[2] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[3] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[4] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[5] S. Vijayakumar,et al. Competitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling , 2004 .
[6] Motoaki Kawanabe,et al. Trading Variance Reduction with Unbiasedness: The Regularized Subspace Information Criterion for Robust Model Selection in Kernel Regression , 2004, Neural Computation.
[7] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[8] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[9] Klaus-Robert Müller,et al. Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..
[10] Masashi Sugiyama,et al. Adaptive importance sampling for value function approximation in off-policy reinforcement learning , 2009, Neural Networks.
[11] Christian R. Shelton,et al. Policy Improvement for POMDPs Using Normalized Importance Sampling , 2001, UAI.
[12] Masashi Sugiyama,et al. Input-dependent estimation of generalization error under covariate shift , 2005 .
[13] H. Shimodaira,et al. Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .
[14] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[15] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.
[16] Pawel Wawrzynski,et al. Real-time reinforcement learning by sequential Actor-Critics and experience replay , 2009, Neural Networks.
[17] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[18] Geoffrey E. Hinton,et al. Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.
[19] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[21] Masashi Sugiyama,et al. Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning , 2009, IJCAI.
[22] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[23] Mark W. Spong,et al. The swing up control problem for the Acrobot , 1995 .
[24] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[25] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .
[26] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[27] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.
[28] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[29] Masashi Sugiyama,et al. Efficient Sample Reuse in EM-Based Policy Search , 2009, ECML/PKDD.