Reinforcement learning in robotics: A survey

Reinforcement learning offers to robotics a framework and set of tools for the design of sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this article, we attempt to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots. We highlight both key challenges in robot reinforcement learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our paper lies on the choice between model-based and model-free as well as between value-function-based and policy-search methods. By analyzing a simple problem in some detail we demonstrate how reinforcement learning approaches may be profitably applied, and we note throughout open questions and the tremendous potential for future research.

[1]  R. E. Kalman,et al.  When Is a Linear Control System Optimal , 1964 .

[2]  Richard Bellman,et al.  Introduction to the mathematical theory of control processes , 1967 .

[3]  J. T. O'Hanlan The Fosbury flop. , 1968, Virginia medical monthly.

[4]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[5]  R. L. Keeney,et al.  Decisions with Multiple Objectives: Preferences and Value Trade-Offs , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  Suguru Arimoto,et al.  Bettering operation of Robots by learning , 1984, J. Field Robotics.

[7]  Peter W. Glynn,et al.  Likelilood ratio gradient estimation: an overview , 1987, WSC '87.

[8]  Anil V. Rao,et al.  Practical Methods for Optimal Control Using Nonlinear Programming , 1987 .

[9]  Karl Johan Åström,et al.  Adaptive Control , 1989, Embedded Digital Control with Microcontrollers.

[10]  Mitsuo Kawato,et al.  Feedback-Error-Learning Neural Network for Supervised Motor Learning , 1990 .

[11]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[12]  R.J. Williams,et al.  Reinforcement learning is direct adaptive optimal control , 1991, IEEE Control Systems.

[13]  Oliver G. Selfridge,et al.  Real-time learning: a ball on a beam , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[14]  Marco Colombetti,et al.  Robot shaping: developing situated agents through learning , 1992 .

[15]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[16]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[17]  Christopher G. Atkeson,et al.  Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.

[18]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[19]  Leemon C Baird,et al.  Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .

[20]  F. B. Vernadat,et al.  Decisions with Multiple Objectives: Preferences and Value Tradeoffs , 1994 .

[21]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[22]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[23]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[24]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[25]  S. Schaal,et al.  Robot juggling: implementation of memory-based learning , 1994, IEEE Control Systems.

[26]  Prasad Tadepalli,et al.  H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward , 1994 .

[27]  John Rust Using Randomization to Break the Curse of Dimensionality , 1997 .

[28]  V. Gullapalli,et al.  Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.

[29]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1995 .

[30]  Sebastian Thrun,et al.  An approach to learning mobile robot navigation , 1995, Robotics Auton. Syst..

[31]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[32]  Inman Harvey,et al.  Noise and the Reality Gap: The Use of Simulation in Evolutionary Robotics , 1995, ECAL.

[33]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[34]  S. Schaal,et al.  A Kendama Learning Robot Based on Bi-directional Theory , 1996, Neural Networks.

[35]  George A. Bekey,et al.  Rapid Reinforcement Learning for Reactive Control Policy Design for Autonomous Robots , 1996 .

[36]  Jean-Arcady Meyer,et al.  Learning reactive and planning rules in a motivationally autonomous animat , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[37]  B. Pasik-Duncan,et al.  Adaptive Control , 1996, IEEE Control Systems.

[38]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[39]  Jeff G. Schneider,et al.  Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning , 1996, NIPS.

[40]  Judy A. Franklin,et al.  Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[41]  Atsuo Takanishi,et al.  Development of a biped walking robot having antagonistic driven joints using nonlinear spring mechanism , 1997, Proceedings of International Conference on Robotics and Automation.

[42]  Claude F. Touzet,et al.  Neural reinforcement learning for behaviour synthesis , 1997, Robotics Auton. Syst..

[43]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[44]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[45]  Christopher G. Atkeson,et al.  Nonparametric Model-Based Reinforcement Learning , 1997, NIPS.

[46]  András Lörincz,et al.  Module Based Reinforcement Learning: An Application to a Real Robot , 1997, EWLR.

[47]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[48]  Roderic A. Grupen,et al.  A feedback control structure for on-line learning tasks , 1997, Robotics Auton. Syst..

[49]  J. Doyle,et al.  Essentials of Robust Control , 1997 .

[50]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[51]  Minoru Asada,et al.  Cooperative behavior acquisition in multi-mobile robots environment by reinforcement learning based on state vector estimation , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[52]  Frank Kirchner Q-learning of complex behaviours on a six-legged walking machine , 1998, Robotics Auton. Syst..

[53]  Leslie Pack Kaelbling,et al.  A Framework for Reinforcement Learning on Real Robots , 1998, AAAI/IAAI.

[54]  Gerald Sommer,et al.  Integrating symbolic knowledge in reinforcement learning , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[55]  Stuart J. Russell Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.

[56]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[57]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[58]  Mark D. Pendrith Reinforcement Learning in Situated Agents: Theoretical and Practical Solutions , 1999, EWLR.

[59]  PracticalSolutionsMark D. Pendrith Reinforcement Learning in Situated Agents : Some Theoretical Problems and , 1999 .

[60]  Geoffrey J. Gordon,et al.  Approximate solutions to markov decision processes , 1999 .

[61]  Karsten Berns,et al.  Adaptive periodic movement control for the four legged walking machine BISAM , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[62]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[63]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[64]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[65]  Luke Fletcher,et al.  Reinforcement learning for a vision based mobile robot , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[66]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[67]  Shigenobu Kobayashi,et al.  Reinforcement learning of walking behavior for a four-legged robot , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).

[68]  Andrew W. Moore,et al.  Direct Policy Search using Paired Statistical Tests , 2001, ICML.

[69]  Kazuaki Yamada,et al.  Emergent synthesis of motion patterns for locomotion robots , 2001, Artif. Intell. Eng..

[70]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[71]  Christopher G. Atkeson,et al.  Learning from observation using primitives , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[72]  Jun Morimoto,et al.  Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[73]  Shin Ishii,et al.  Reinforcement Learning for Biped Locomotion , 2002, ICANN.

[74]  A. Shwartz,et al.  Handbook of Markov decision processes : methods and applications , 2002 .

[75]  Andrew G. Barto,et al.  Lyapunov Design for Safe Reinforcement Learning , 2003, J. Mach. Learn. Res..

[76]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[77]  Xiao Huang,et al.  Novelty and Reinforcement Learning in the Value System of Developmental Robots , 2002 .

[78]  Mikael Norrlöf,et al.  An adaptive iterative learning control algorithm with experiments on an industrial robot , 2002, IEEE Trans. Robotics Autom..

[79]  John Langford,et al.  Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[80]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[81]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[82]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[83]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[84]  Martin A. Riedmiller,et al.  Reinforcement learning on an omnidirectional mobile robot , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[85]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[86]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[87]  Jeff G. Schneider,et al.  Covariant Policy Search , 2003, IJCAI.

[88]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[89]  Jürgen Schmidhuber,et al.  A robot that reinforcement-learns to identify and memorize important previous observations , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[90]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[91]  Gordon Cheng,et al.  Learning from Observation and from Practice Using Behavioral Primitives , 2003, ISRR.

[92]  T. J. Rivlin An Introduction to the Approximation of Functions , 2003 .

[93]  Jeff G. Schneider,et al.  Policy Search by Dynamic Programming , 2003, NIPS.

[94]  Darrin C. Bentivegna,et al.  Learning From Observation and Practice Using Behavioral Primitives : Marble Maze , 2004 .

[95]  Peter L. Bartlett,et al.  Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[96]  Peggy Fidelman,et al.  Learning Ball Acquisition on a Physical Robot , 2004 .

[97]  Dieter Fox,et al.  Reinforcement learning for sensing strategies , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[98]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[99]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[100]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[101]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[102]  Stefan Schaal,et al.  Scalable Techniques from Nonparametric Statistics for Real Time Robot Learning , 2002, Applied Intelligence.

[103]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[104]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[105]  A. Moore,et al.  Learning decisions: robustness, uncertainty, and approximation , 2004 .

[106]  Dirk P. Kroese,et al.  The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .

[107]  G. DeJong,et al.  Theory and Application of Reward Shaping in Reinforcement Learning , 2004 .

[108]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.

[109]  M.T. Rosenstein,et al.  Reinforcement learning with supervision by a stable controller , 2004, Proceedings of the 2004 American Control Conference.

[110]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine Learning.

[111]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[112]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[113]  Ashutosh Saxena,et al.  High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.

[114]  Takayuki Kanda,et al.  Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[115]  H. Sebastian Seung,et al.  Learning to Walk in 20 Minutes , 2005 .

[116]  H. Kappen Path integrals and symmetry breaking for optimal control theory , 2005, physics/0505066.

[117]  Minoru Asada,et al.  Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning , 2005, Machine Learning.

[118]  John Langford,et al.  Relating reinforcement learning performance to classification performance , 2005, ICML '05.

[119]  Maarten Steinbuch,et al.  Learning-based identification and iterative learning control of direct-drive robots , 2005, IEEE Transactions on Control Systems Technology.

[120]  Vishal Soni,et al.  Reinforcement learning of hierarchical skills on the sony aibo robot , 2005, AAAI 2005.

[121]  Florentin Wörgötter,et al.  Fast biped walking with a reflexive controller and real-time policy searching , 2005, NIPS.

[122]  Tomás Martínez-Marín,et al.  Fast Reinforcement Learning for Vision-guided Mobile Robots , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[123]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[124]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[125]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[126]  Pieter Abbeel,et al.  Using inaccurate models in reinforcement learning , 2006, ICML.

[127]  Emanuel Todorov,et al.  Optimal Control Theory , 2006 .

[128]  H. Liu,et al.  A Heuristic Reinforcement Learning for Robot Approaching Objects , 2006, 2006 IEEE Conference on Robotics, Automation and Mechatronics.

[129]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[130]  David M. Bradley,et al.  Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[131]  Sven Behnke,et al.  Imitative Reinforcement Learning for Soccer Playing Robots , 2006, RoboCup.

[132]  Wolfram Burgard,et al.  Learning Relational Navigation Policies , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[133]  Robert Platt,et al.  Improving Grasp Skills Using Schema Structured Learning , 2006 .

[134]  Jürgen Schmidhuber,et al.  Quasi-online reinforcement learning for robots , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[135]  Aude Billard,et al.  Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.

[136]  W. Burgard,et al.  Autonomous blimp control using model-free reinforcement learning in a continuous state and action space , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[137]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[138]  Lucas Paletta,et al.  Perception and Developmental Learning of Affordances in Autonomous Robots , 2007, KI.

[139]  Martin A. Riedmiller,et al.  Neural Reinforcement Learning Controllers for a Real Robot Application , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[140]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[141]  I. Elhanany Reinforcement Learning in Sensor-Guided AIBO Robots , 2007 .

[142]  Stefan Schaal,et al.  Dynamics systems vs. optimal control--a unifying view. , 2007, Progress in brain research.

[143]  Siddhartha S. Srinivasa,et al.  Imitation learning for locomotion and manipulation , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[144]  Dieter Fox,et al.  Gaussian Processes and Reinforcement Learning for Identification and Control of an Autonomous Blimp , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[145]  Richard S. Sutton,et al.  On the role of tracking in stationary environments , 2007, ICML '07.

[146]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[147]  Richard Alan Peters,et al.  Reinforcement Learning with a Supervisor for a Mobile Robot in a Real-world Environment , 2007, 2007 International Symposium on Computational Intelligence in Robotics and Automation.

[148]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[149]  Yi Gu,et al.  Space-indexed dynamic programming: learning to follow trajectories , 2008, ICML '08.

[150]  David Silver,et al.  High Performance Outdoor Navigation from Overhead Data using Imitation Learning , 2008, Robotics: Science and Systems.

[151]  M. Goodman Learning to Walk: The Origins of the UK's Joint Intelligence Committee , 2008 .

[152]  Astrophysics Departm Reinforcement Learning of Behaviors in Mobile Robots Using Noisy Infrared Sensing , 2008 .

[153]  Sebastian Thrun,et al.  Apprenticeship learning for motion planning with application to parking lot navigation , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[154]  Jun Nakanishi,et al.  Operational Space Control: A Theoretical and Empirical Comparison , 2008, Int. J. Robotics Res..

[155]  Jürgen Schmidhuber,et al.  State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.

[156]  Yong Duan,et al.  Robot Navigation Based on Fuzzy RL Algorithm , 2008, ISNN.

[157]  Kemal Leblebicioglu,et al.  Free gait generation with reinforcement learning for a six-legged robot , 2008, Robotics Auton. Syst..

[158]  Nicholas Roy,et al.  Trajectory Optimization using Reinforcement Learning for Map Exploration , 2008, Int. J. Robotics Res..

[159]  Stefan Schaal,et al.  Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[160]  Tomohiro Shibata,et al.  Policy Gradient Learning of Cooperative Interaction with a Robot Using User's Biological Signals , 2009, ICONIP.

[161]  Stefan Schaal,et al.  Proc. Advances in Neural Information Processing Systems (NIPS '08) , 2008 .

[162]  Jun Morimoto,et al.  Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[163]  Betty J. Mohler,et al.  Learning perceptual coupling for motor primitives , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[164]  Kazuhiro Ohkura,et al.  A Reinforcement Learning Technique with an Adaptive Action Generator for a Multi-robot System , 2008, SAB.

[165]  Stefan Schaal,et al.  Learning to Control in Operational Space , 2008, Int. J. Robotics Res..

[166]  Brett Browning,et al.  Learning robot motion control with demonstration and advice-operators , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[167]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[168]  Oliver Brock,et al.  Learning to Manipulate Articulated Objects in Unstructured Environments Using a Grounded Relational Representation , 2008, Robotics: Science and Systems.

[169]  Jan Peters,et al.  Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[170]  Ales Ude,et al.  Task adaptation through exploration and action sequencing , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[171]  Michel Tokic,et al.  The Crawler, A Class Room Demonstrator for Reinforcement Learning , 2009, FLAIRS.

[172]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[173]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[174]  Andrew Y. Ng,et al.  Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[175]  Pieter Abbeel,et al.  Apprenticeship learning for helicopter control , 2009, CACM.

[176]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[177]  Andrew Y. Ng,et al.  Policy search via the signed derivative , 2009, Robotics: Science and Systems.

[178]  Marc Toussaint,et al.  Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.

[179]  Oliver Kroemer,et al.  Learning Visual Representations for Interactive Systems , 2009, ISRR.

[180]  Oliver Kroemer,et al.  Towards Motor Skill Learning for Robotics , 2007, ISRR.

[181]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[182]  Oliver Kroemer,et al.  Active learning using mean shift optimization for robot grasping , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[183]  Sethu Vijayakumar,et al.  Using dimensionality reduction to exploit constraints in reinforcement learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[184]  Maren Bennewitz,et al.  Learning reliable and efficient navigation with a humanoid , 2010, 2010 IEEE International Conference on Robotics and Automation.

[185]  Stefan Schaal,et al.  Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.

[186]  Pieter Abbeel,et al.  Superhuman performance of surgical tasks by robots using iterative learning from human-guided demonstrations , 2010, 2010 IEEE International Conference on Robotics and Automation.

[187]  Christopher G. Atkeson,et al.  Control of Instantaneously Coupled Systems applied to humanoid walking , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[188]  Ian R. Manchester,et al.  LQR-trees: Feedback Motion Planning via Sums-of-Squares Verification , 2010, Int. J. Robotics Res..

[189]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[190]  Oliver Kroemer,et al.  Combining active learning and reactive control for robot grasping , 2010, Robotics Auton. Syst..

[191]  Jan Peters,et al.  Reinforcement Learning to Adjust Robot Movements to New Situations , 2010, IJCAI.

[192]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[193]  Jun Zhang,et al.  Motor Learning at Intermediate Reynolds Number: Experiments with Policy Gradient on the Flapping Flight of a Rigid Wing , 2010, From Motor Learning to Interaction Learning in Robots.

[194]  Richard L. Lewis,et al.  Reward Design via Online Gradient Ascent , 2010, NIPS.

[195]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[196]  Bojan Nemec,et al.  Learning of a ball-in-a-cup playing robot , 2010, 19th International Workshop on Robotics in Alpe-Adria-Danube Region (RAAD 2010).

[197]  Marc Peter Deisenroth,et al.  A Practical and Conceptual Framework for Learning in Control , 2010 .

[198]  Sebastian Thrun,et al.  A probabilistic approach to mixed open-loop and closed-loop control, with application to extreme autonomous driving , 2010, 2010 IEEE International Conference on Robotics and Automation.

[199]  Eric Rogers,et al.  Iterative learning control applied to a gantry robot and conveyor system , 2010 .

[200]  David Silver,et al.  Learning from Demonstration for Autonomous Navigation in Complex Unstructured Terrain , 2010, Int. J. Robotics Res..

[201]  Peter Stone,et al.  Generalized model learning for Reinforcement Learning on a humanoid robot , 2010, 2010 IEEE International Conference on Robotics and Automation.

[202]  Olivier Sigaud,et al.  From Motor Learning to Interaction Learning in Robots , 2010, From Motor Learning to Interaction Learning in Robots.

[203]  Jörg Stückler,et al.  Learning Motion Skills from Expert Demonstrations and Own Experience using Gaussian Process Regression , 2010, ISR/ROBOTIK.

[204]  Marc Toussaint,et al.  Bayesian Time Series Models: Expectation maximisation methods for solving (PO)MDPs and optimal control problems , 2011 .

[205]  Scott Kuindersma,et al.  Autonomous Skill Acquisition on a Mobile Manipulator , 2011, AAAI.

[206]  Ian R. Manchester,et al.  Feedback controller parameterizations for Reinforcement Learning , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[207]  Stephen Hart,et al.  Learning Generalizable Control Programs , 2011, IEEE Transactions on Autonomous Mental Development.

[208]  Carl E. Rasmussen,et al.  Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[209]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[210]  Stefan Schaal,et al.  Learning variable impedance control , 2011, Int. J. Robotics Res..

[211]  Martial Hebert,et al.  Learning message-passing inference machines for structured prediction , 2011, CVPR 2011.

[212]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[213]  Stefan Schaal,et al.  Skill learning and task outcome prediction for manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[214]  P. Schrimpf,et al.  Dynamic Programming , 2011 .

[215]  Stefan Schaal,et al.  Learning motion primitive goals for robust manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[216]  Jan Peters,et al.  Reinforcement Learning to Adjust Robot Movements to New Situations , 2010, IJCAI.

[217]  Ales Ude,et al.  Learning to pour with a robot arm combining goal and shape learning for dynamic movement primitives , 2011, Robotics Auton. Syst..

[218]  Stefan Schaal,et al.  Learning force control policies for compliant manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[219]  Oliver Kroemer,et al.  Learning visual representations for perception-action systems , 2011, Int. J. Robotics Res..

[220]  Howie Choset,et al.  Using response surfaces and expected improvement to optimize snake robot gait parameters , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[221]  George Konidaris,et al.  Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[222]  Scott Kuindersma,et al.  Learning dynamic arm motions for postural recovery , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[223]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[224]  J. Andrew Bagnell,et al.  Agnostic System Identification for Model-Based Reinforcement Learning , 2012, ICML.

[225]  Warren B. Powell,et al.  AI, OR and Control Theory: A Rosetta Stone for Stochastic Optimization , 2012 .

[226]  J. Andrew Bagnell,et al.  Reinforcement Planning: RL for optimal planners , 2012, 2012 IEEE International Conference on Robotics and Automation.

[227]  Jan Peters,et al.  Learning concurrent motor skills in versatile solution spaces , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[228]  Scott Kuindersma,et al.  Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[229]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[230]  Pieter Abbeel,et al.  Safe Exploration in Markov Decision Processes , 2012, ICML.

[231]  Peter Stone,et al.  RTMBA: A Real-Time Model-Based Reinforcement Learning Architecture for robot control , 2011, 2012 IEEE International Conference on Robotics and Automation.

[232]  Hsien-I Lin,et al.  Learning collision-free reaching skill from primitives , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[233]  Sanjiban Choudhury Application of Reinforcement Learning in Robot Soccer ! , 2013 .

[234]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[235]  P. Glynn LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .

[236]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .