论文信息 - Eager and Memory-Based Non-Parametric Stochastic Search Methods for Learning Control

Eager and Memory-Based Non-Parametric Stochastic Search Methods for Learning Control

Direct policy search has shown to be a successful method to optimize robot controller parameters. However, defining a good parametric form for the controller can be challenging for complex problems. Non-parametric methods provide a flexible alternative and are thus a promising tool in robot skill learning. In this paper, we investigate two nonparametric methods based on similar principles but utilizing differing computing schedules: an eager learner and a memory-based learner. We compare the methods experimentally on two different control problems. Furthermore, we define and evaluate a new ‘hybrid’ controller that combines the strong points of both of these methods.

[1] Jan Peters,et al. Reinforcement Learning to Adjust Robot Movements to New Situations , 2010, IJCAI.

[2] Luís Paulo Reis,et al. Non-parametric contextual stochastic search , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3] Gavin Taylor,et al. Kernelized value function approximation for reinforcement learning , 2009, ICML '09.

[4] Andrew W. Moore,et al. Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[5] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.

[6] Petros Koumoutsakos,et al. Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[7] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.

[8] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[9] T. Jung,et al. Kernelizing LSPE(λ) , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[10] Luís Paulo Reis,et al. Model-Based Relative Entropy Stochastic Search , 2016, NIPS.

[11] Leslie Pack Kaelbling,et al. Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[12] Tom Schaul,et al. Efficient natural evolution strategies , 2009, GECCO.

[13] Oliver Kroemer,et al. A Non-Parametric Approach to Dynamic Programming , 2011, NIPS.

[14] David W. Aha,et al. Learning to Catch: Applying Nearest Neighbor Algorithms to Dynamic Control Tasks , 1994 .

[15] Christopher G. Atkeson,et al. Using locally weighted regression for robot learning , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[16] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .

[17] Marc Toussaint,et al. Path Integral Control by Reproducing Kernel Hilbert Space Embedding , 2013, IJCAI.

[18] Masashi Sugiyama,et al. Policy Search with High-Dimensional Context Variables , 2016, AAAI.

[19] Jürgen Schmidhuber,et al. State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.

[20] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[21] Jan Peters,et al. Non-parametric Policy Search with Limited Information Loss , 2017, J. Mach. Learn. Res..

[22] Jun Nakanishi,et al. Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[23] Bruno Castro da Silva,et al. Learning Parameterized Skills , 2012, ICML.

[24] Luís Paulo Reis,et al. Contextual Stochastic Search , 2016, GECCO.

[25] Jason Pazis,et al. Non-Parametric Approximate Linear Programming for MDPs , 2011, AAAI.

[26] Olivier Sigaud,et al. Learning compact parameterized skills with a single regression , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).

[27] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[28] Ai Poh Loh,et al. Model-based contextual policy search for data-efficient generalization of robot skills , 2017, Artif. Intell..

[29] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[30] Peter Englert,et al. Policy Search in Reproducing Kernel Hilbert Space , 2016, IJCAI.

[31] Olivier Sigaud,et al. Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[32] Guy Lever,et al. Modelling transition dynamics in MDPs with RKHS embeddings , 2012, ICML.

[33] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[34] Jing Peng,et al. Efficient Memory-Based Dynamic Programming , 1995, ICML.

[35] Prasad Tadepalli,et al. Scaling Up Average Reward Reinforcement Learning by Approximating the Domain Models and the Value Function , 1996, ICML.

[36] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.

[37] Guy Lever,et al. Modelling Policies in MDPs in Reproducing Kernel Hilbert Space , 2015, AISTATS.

[38] Jan Peters,et al. Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[39] Sehoon Ha,et al. Evolutionary optimization for parameterized whole-body dynamic motor skills , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).