Eager and Memory-Based Non-Parametric Stochastic Search Methods for Learning Control

Direct policy search has shown to be a successful method to optimize robot controller parameters. However, defining a good parametric form for the controller can be challenging for complex problems. Non-parametric methods provide a flexible alternative and are thus a promising tool in robot skill learning. In this paper, we investigate two nonparametric methods based on similar principles but utilizing differing computing schedules: an eager learner and a memory-based learner. We compare the methods experimentally on two different control problems. Furthermore, we define and evaluate a new ‘hybrid’ controller that combines the strong points of both of these methods.

[1]  Jan Peters,et al.  Reinforcement Learning to Adjust Robot Movements to New Situations , 2010, IJCAI.

[2]  Luís Paulo Reis,et al.  Non-parametric contextual stochastic search , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Gavin Taylor,et al.  Kernelized value function approximation for reinforcement learning , 2009, ICML '09.

[4]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[5]  Liming Xiang,et al.  Kernel-Based Reinforcement Learning , 2006, ICIC.

[6]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[7]  Xin Xu,et al.  Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.

[8]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[9]  T. Jung,et al.  Kernelizing LSPE(λ) , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[10]  Luís Paulo Reis,et al.  Model-Based Relative Entropy Stochastic Search , 2016, NIPS.

[11]  Leslie Pack Kaelbling,et al.  Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[12]  Tom Schaul,et al.  Efficient natural evolution strategies , 2009, GECCO.

[13]  Oliver Kroemer,et al.  A Non-Parametric Approach to Dynamic Programming , 2011, NIPS.

[14]  David W. Aha,et al.  Learning to Catch: Applying Nearest Neighbor Algorithms to Dynamic Control Tasks , 1994 .

[15]  Christopher G. Atkeson,et al.  Using locally weighted regression for robot learning , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[16]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[17]  Marc Toussaint,et al.  Path Integral Control by Reproducing Kernel Hilbert Space Embedding , 2013, IJCAI.

[18]  Masashi Sugiyama,et al.  Policy Search with High-Dimensional Context Variables , 2016, AAAI.

[19]  Jürgen Schmidhuber,et al.  State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.

[20]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[21]  Jan Peters,et al.  Non-parametric Policy Search with Limited Information Loss , 2017, J. Mach. Learn. Res..

[22]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[23]  Bruno Castro da Silva,et al.  Learning Parameterized Skills , 2012, ICML.

[24]  Luís Paulo Reis,et al.  Contextual Stochastic Search , 2016, GECCO.

[25]  Jason Pazis,et al.  Non-Parametric Approximate Linear Programming for MDPs , 2011, AAAI.

[26]  Olivier Sigaud,et al.  Learning compact parameterized skills with a single regression , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).

[27]  Carl E. Rasmussen,et al.  Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[28]  Ai Poh Loh,et al.  Model-based contextual policy search for data-efficient generalization of robot skills , 2017, Artif. Intell..

[29]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[30]  Peter Englert,et al.  Policy Search in Reproducing Kernel Hilbert Space , 2016, IJCAI.

[31]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[32]  Guy Lever,et al.  Modelling transition dynamics in MDPs with RKHS embeddings , 2012, ICML.

[33]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[34]  Jing Peng,et al.  Efficient Memory-Based Dynamic Programming , 1995, ICML.

[35]  Prasad Tadepalli,et al.  Scaling Up Average Reward Reinforcement Learning by Approximating the Domain Models and the Value Function , 1996, ICML.

[36]  Shie Mannor,et al.  Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.

[37]  Guy Lever,et al.  Modelling Policies in MDPs in Reproducing Kernel Hilbert Space , 2015, AISTATS.

[38]  Jan Peters,et al.  Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[39]  Sehoon Ha,et al.  Evolutionary optimization for parameterized whole-body dynamic motor skills , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).