Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer
暂无分享,去创建一个
Kenji Doya | Jiexin Wang | Eiji Uchibe | K. Doya | E. Uchibe | Jiexin Wang
[1] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[2] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[3] Doya Kenji,et al. Standing-up and Balancing Behaviors of Android Phone Robot -- Control of Spring-attached Wheeled Inverted Pendulum , 2013 .
[4] Frank Sehnke,et al. Parameter-exploring policy gradients , 2010, Neural Networks.
[5] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[6] Tom Schaul,et al. Fitness Expectation Maximization , 2008, PPSN.
[7] Kenji Doya,et al. The Cyber Rodent Project: Exploration of Adaptive Mechanisms for Self-Preservation and Self-Reproduction , 2005, Adapt. Behav..
[8] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[9] Jun Morimoto,et al. Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration , 2012, Neural Computation.
[10] Gang Niu,et al. Analysis and Improvement of Policy Gradient Estimation , 2011, NIPS.
[11] Martin A. Riedmiller,et al. Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[12] Olivier Sigaud,et al. Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.
[13] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[14] Stefan Schaal,et al. A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..
[15] Shie Mannor,et al. The Cross Entropy Method for Fast Policy Search , 2003, ICML.
[16] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .
[17] Nikolaus Hansen,et al. Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.
[18] Masashi Sugiyama,et al. Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning , 2011, Neural Computation.
[19] Kenji Doya,et al. EM-based policy hyper parameter exploration: application to standing and balancing of a two-wheeled smartphone robot , 2015, Artificial Life and Robotics.
[20] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).
[21] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[22] Jan Peters,et al. Data-Efficient Generalization of Robot Skills with Contextual Policy Search , 2013, AAAI.
[23] Paolo Dario,et al. Special issue on robotics and neuroscience , 2008, Neural Networks.
[24] Luís Paulo Reis,et al. Regularized covariance estimation for weighted maximum likelihood policy search methods , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).