论文信息 - Learning Options for an MDP from Demonstrations

Learning Options for an MDP from Demonstrations

The options framework provides a foundation to use hierarchical actions in reinforcement learning. An agent using options, along with primitive actions, at any point in time can decide to perform a macro-action made out of many primitive actions rather than a primitive action. Such macro-actions can be hand-crafted or learned. There has been previous work on learning them by exploring the environment. Here we take a different perspective and present an approach to learn options from a set of experts demonstrations. Empirical results are also presented in a similar setting to the one used in other works in this area.

[1] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[2] Doina Precup,et al. Learning Options in Reinforcement Learning , 2002, SARA.

[3] Peter Stone,et al. Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.

[4] Andrea Lockerd Thomaz,et al. Abstraction from demonstration for efficient reinforcement learning in high-dimensional domains , 2014, Artif. Intell..

[5] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[6] Andrew G. Barto,et al. Intrinsically Motivated Hierarchical Skill Learning in Structured Environments , 2010, IEEE Transactions on Autonomous Mental Development.

[7] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[8] Alicia P. Wolfe,et al. Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[9] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[10] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[11] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[12] Pat Langley,et al. Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29 - July 2, 2000 , 2000, ICML 2000.

[13] Shie Mannor,et al. Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.

[14] Ben Tse,et al. Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[15] Gaël Varoquaux,et al. The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[16] Matthieu Geist,et al. Batch, Off-Policy and Model-Free Apprenticeship Learning , 2011, EWRL.

[17] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[18] Lorenza Saitta,et al. Abstraction, Reformulation and Approximation , 2008 .

[19] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[20] Peter A. Flach,et al. Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[21] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[22] Andrew G. Barto,et al. Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[23] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[24] Oussama Khatib,et al. Experimental Robotics IV, The 4th International Symposium, Stanford, California, USA, June 30 - July 2, 1995 , 1997, ISER.

[25] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[26] François Laviolette,et al. Learning with Randomized Majority Votes , 2010, ECML/PKDD.

[27] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[28] Peter Stone,et al. The utility of temporal abstraction in reinforcement learning , 2008, AAMAS.

[29] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[30] Andrew Tridgell,et al. KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search , 1998, ICML.

[31] Libor Preucil,et al. European Robotics Symposium 2008 , 2008 .

[32] Leslie Pack Kaelbling,et al. Recent Advances in Reinforcement Learning , 1996, Springer US.