Importance sampling policy gradient algorithms in reproducing kernel Hilbert space
暂无分享,去创建一个
TaeChoong Chung | Ngo Anh Vien | Tuyen Pham Le | P. Marlith Jaramillo | TaeChoong Chung | T. P. Le | P. M. Jaramillo
[1] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[2] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[3] Peter Englert,et al. Policy Search in Reproducing Kernel Hilbert Space , 2016, IJCAI.
[4] Noah J. Cowan,et al. Efficient Gradient Estimation for Motor Control Learning , 2002, UAI.
[5] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[6] Jan Peters,et al. Learning concurrent motor skills in versatile solution spaces , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[7] Jan Peters,et al. Reinforcement Learning to Adjust Robot Movements to New Situations , 2010, IJCAI.
[8] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[9] Jun Morimoto,et al. Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration , 2012, Neural Computation.
[10] R. D. Nardi. The QRSim Quadrotors Simulator , 2013 .
[11] Guy Lever,et al. Modelling Policies in MDPs in Reproducing Kernel Hilbert Space , 2015, AISTATS.
[12] A. Atiya,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.
[13] Sergey Levine,et al. Variational Policy Search via Trajectory Optimization , 2013, NIPS.
[14] Jean-Philippe Thiran,et al. Kernel matching pursuit for large datasets , 2005, Pattern Recognit..
[15] Christian R. Shelton,et al. Policy Improvement for POMDPs Using Normalized Importance Sampling , 2001, UAI.
[16] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[17] Gerhard Neumann,et al. Variational Inference for Policy Search in changing situations , 2011, ICML.
[18] Jan Peters,et al. Nonamemanuscript No. (will be inserted by the editor) Reinforcement Learning to Adjust Parametrized Motor Primitives to , 2011 .
[19] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[20] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[21] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[22] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[23] Charles A. Micchelli,et al. On Learning Vector-Valued Functions , 2005, Neural Computation.
[24] Masashi Sugiyama,et al. Adaptive importance sampling for value function approximation in off-policy reinforcement learning , 2009, Neural Networks.
[25] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[26] Peyman Milanfar,et al. A Tour of Modern Image Filtering: New Insights and Methods, Both Practical and Theoretical , 2013, IEEE Signal Processing Magazine.
[27] Pawel Wawrzynski,et al. Real-time reinforcement learning by sequential Actor-Critics and experience replay , 2009, Neural Networks.
[28] Alex Smola,et al. Kernel methods in machine learning , 2007, math/0701907.
[29] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[30] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .
[31] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.
[32] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.
[33] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[34] J. Bagnell,et al. Policy search in kernel Hilbert space , 2003 .