Learning Parameterized Skills

We introduce a method for constructing skills capable of solving tasks drawn from a distribution of parameterized reinforcement learning problems. The method draws example tasks from a distribution of interest and uses the corresponding learned policies to estimate the topology of the lower-dimensional piecewise-smooth manifold on which the skill policies lie. This manifold models how policy parameters change as task parameters vary. The method identifies the number of charts that compose the manifold and then applies non-linear regression in each chart to construct a parameterized skill by predicting policy parameters from task parameters. We evaluate our method on an underactuated simulated robotic arm tasked with learning to accurately throw darts at a parameterized target location.

[1]  H. Harlow Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. , 1950, Journal of comparative and physiological psychology.

[2]  R. W. White Motivation reconsidered: the concept of competence. , 1959, Psychological review.

[3]  J. Urgen Schmidhuber,et al.  Adaptive confidence and adaptive curiosity , 1991, Forschungsberichte, TU Munich.

[4]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[5]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[6]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[7]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[8]  Roderic A. Grupen,et al.  Learning to Coordinate Controllers - Reinforcement Learning on a Control Basis , 1997, IJCAI.

[9]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[10]  David Andre,et al.  Model based Bayesian Exploration , 1999, UAI.

[11]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[12]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[15]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[16]  E. Deci,et al.  Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.

[17]  J. Baxter,et al.  Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[18]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[19]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[20]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[21]  Shie Mannor,et al.  The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[22]  Jun Nakanishi,et al.  Learning Movement Primitives , 2005, ISRR.

[23]  Michael O. Duff,et al.  Design for an Optimal Probe , 2003, ICML.

[24]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[25]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[26]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[27]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[28]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[29]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[30]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[31]  Andrew G. Barto,et al.  Intrinsically Motivated Reinforcement Learning: A Promising Framework for Developmental Robot Learning , 2005 .

[32]  Alicia P. Wolfe,et al.  Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[33]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[34]  N. Berthier,et al.  Development of reaching in infancy , 2006, Experimental Brain Research.

[35]  Oliver Brock,et al.  A Framework for Learning and Control in Intelligent Humanoid Robots , 2005, Int. J. Humanoid Robotics.

[36]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[37]  Vishal Soni,et al.  Reinforcement learning of hierarchical skills on the sony aibo robot , 2005, AAAI 2005.

[38]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[39]  Lihong Li,et al.  PAC model-free reinforcement learning , 2006, ICML.

[40]  Andrew G. Barto,et al.  An intrinsic reward mechanism for efficient exploration , 2006, ICML.

[41]  Peter Stone,et al.  Value-Function-Based Transfer for Reinforcement Learning Using Structure Mapping , 2006, AAAI.

[42]  K. A. Ericsson,et al.  The Influence of Experience and Deliberate Practice on the Development of Superior Expert Performance , 2006 .

[43]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[44]  Peter Stone,et al.  Cross-domain transfer for reinforcement learning , 2007, ICML '07.

[45]  Heiko Hoffmann,et al.  Sensor-assisted adaptive motor control under continuously varying context , 2007, ICINCO-ICSO.

[46]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[47]  Stephen Hart,et al.  Generalization and Transfer in Robot Control , 2008 .

[48]  Gerald DeJong,et al.  Active reinforcement learning , 2008, ICML '08.

[49]  Sethu Vijayakumar,et al.  Synthesising Novel Movements through Latent Space Modulation of Scalable Control Policies , 2008, SAB.

[50]  Peter Stone,et al.  Autonomous transfer for reinforcement learning , 2008, AAMAS.

[51]  Andrew G. Barto,et al.  Efficient skill learning using abstraction selection , 2009, IJCAI 2009.

[52]  Stefan Schaal,et al.  Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[53]  Shalabh Bhatnagar,et al.  Natural actor-critic algorithms , 2009, Autom..

[54]  Andrew Y. Ng,et al.  Near-Bayesian exploration in polynomial time , 2009, ICML '09.

[55]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[56]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[57]  Peter Stone,et al.  Transfer learning for reinforcement learning on a physical robot , 2010, AAMAS 2010.

[58]  Jan Peters,et al.  Imitation and Reinforcement Learning , 2010, IEEE Robotics & Automation Magazine.

[59]  Andrew G. Barto,et al.  Competence progress intrinsic motivation , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[60]  Giulio Sandini,et al.  The iCub humanoid robot: An open-systems platform for research in cognitive development , 2010, Neural Networks.

[61]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[62]  Oliver Kroemer,et al.  Combining active learning and reactive control for robot grasping , 2010, Robotics Auton. Syst..

[63]  Ben Taskar,et al.  Structured Prediction Cascades , 2010, AISTATS.

[64]  Wolfram Burgard,et al.  Learning Non-stationary System Dynamics Online Using Gaussian Processes , 2010, DAGM-Symposium.

[65]  D. Wolpert,et al.  Structure Learning in a Sensorimotor Association Task , 2010, PloS one.

[66]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[67]  Peter Stone,et al.  Learning Powerful Kicks on the Aibo ERS-7: The Quest for a Striker , 2010, RoboCup.

[68]  Sergey Levine,et al.  Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[69]  Manfred Huber,et al.  Autonomous identification, categorization and generalization of policies based on task type , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[70]  George Konidaris,et al.  Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[71]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[72]  Miguel Lázaro-Gredilla,et al.  Estimation of the forgetting factor in kernel recursive least squares , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[73]  David Silver,et al.  Active learning from demonstration for robust autonomous navigation , 2012, 2012 IEEE International Conference on Robotics and Automation.

[74]  Jan Peters,et al.  Nonamemanuscript No. (will be inserted by the editor) Reinforcement Learning to Adjust Parametrized Motor Primitives to , 2011 .

[75]  Andrew G. Barto,et al.  Transfer in Reinforcement Learning via Shared Features , 2012, J. Mach. Learn. Res..

[76]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[77]  Ales Ude,et al.  Applying statistical generalization to determine search direction for reinforcement learning of movement primitives , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[78]  Scott Kuindersma,et al.  Variational Bayesian Optimization for Runtime Risk-Sensitive Control , 2012, Robotics: Science and Systems.

[79]  Olivier Sigaud,et al.  Learning compact parameterized skills with a single regression , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).

[80]  Jan Peters,et al.  Information-Theoretic Motor Skill Learning , 2013, AAAI 2013.

[81]  Pierre-Yves Oudeyer,et al.  Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[82]  F. Stulp,et al.  Policy Improvement : Between Black-Box Optimization and Episodic Reinforcement Learning , 2013 .

[83]  Sridhar Mahadevan,et al.  Basis Adaptation for Sparse Nonlinear Reinforcement Learning , 2013, AAAI.

[84]  Andrew G. Barto,et al.  Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[85]  Peter Englert,et al.  Multi-task policy search for robotics , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).