论文信息 - Utilizing negative policy information to accelerate reinforcement learning

Utilizing negative policy information to accelerate reinforcement learning

ACKNOWLEDGEMENTS One consequence of my long tenure at Georgia Tech has been the opportunity to get to know a large number of exceptional people. I'm thankful for tremendous support from some particularly exceptional people: advisor Charles Isbell, a gentleman and a scholar, who gave me the opportunity to think outside the box; and my adoptive advisor Andrea Thomaz, who helped me see it's also quite a good idea to stop and focus on the box long enough to get a bow onto it.

Arya Irani | A. Irani

[1] Charles Elkan,et al. Learning classifiers from only positive and unlabeled data , 2008, KDD.

[2] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[3] Andrea Lockerd Thomaz,et al. Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[4] Eric Wiewiora,et al. Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..

[5] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[6] Andrea Lockerd Thomaz,et al. Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[7] Julian Togelius,et al. Mario AI competition , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[8] Aude Billard,et al. Donut as I do: Learning from failed demonstrations , 2011, 2011 IEEE International Conference on Robotics and Automation.

[9] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .

[10] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[11] Scott Kuindersma,et al. Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories , 2010, NIPS.

[12] Peter Stone,et al. Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[13] Rémi Gilleron,et al. Learning from positive and unlabeled examples , 2000, Theor. Comput. Sci..

[14] K. Subramanian,et al. Learning Options through Human Interaction , 2011 .

[15] Andrew G. Barto,et al. Efficient skill learning using abstraction selection , 2009, IJCAI 2009.

[16] J Glaser,et al. Separation of Concerns , 2014 .

[17] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[18] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[19] Bing Liu,et al. Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression , 2003, ICML.

[20] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[21] Peng Zhou,et al. Discovering options from example trajectories , 2009, ICML '09.

[22] Marco Colombetti,et al. Robot Shaping: An Experiment in Behavior Engineering , 1997 .

[23] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[24] Julian Togelius,et al. The 2009 Mario AI Competition , 2010, IEEE Congress on Evolutionary Computation.

[25] Peng Zang,et al. Scaling solutions to Markov Decision Problems , 2011 .

[26] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[27] Peter Stone,et al. Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[28] Daphne Koller,et al. Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[29] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[30] Craig Boutilier,et al. Exploiting Structure in Policy Construction , 1995, IJCAI.

[31] Kee-Eung Kim,et al. Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.

[32] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[33] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[34] Mitsuo Kawato,et al. Inter-module credit assignment in modular reinforcement learning , 2003, Neural Networks.

[35] Andrew G. Barto,et al. Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[36] Brett Browning,et al. Learning robot motion control with demonstration and advice-operators , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[37] Doina Precup,et al. Learning Options in Reinforcement Learning , 2002, SARA.

[38] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[39] Shie Mannor,et al. Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.

[40] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[41] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.