Utilizing negative policy information to accelerate reinforcement learning
暂无分享,去创建一个
[1] Charles Elkan,et al. Learning classifiers from only positive and unlabeled data , 2008, KDD.
[2] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[3] Andrea Lockerd Thomaz,et al. Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.
[4] Eric Wiewiora,et al. Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..
[5] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[6] Andrea Lockerd Thomaz,et al. Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..
[7] Julian Togelius,et al. Mario AI competition , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.
[8] Aude Billard,et al. Donut as I do: Learning from failed demonstrations , 2011, 2011 IEEE International Conference on Robotics and Automation.
[9] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .
[10] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[11] Scott Kuindersma,et al. Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories , 2010, NIPS.
[12] Peter Stone,et al. Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.
[13] Rémi Gilleron,et al. Learning from positive and unlabeled examples , 2000, Theor. Comput. Sci..
[14] K. Subramanian,et al. Learning Options through Human Interaction , 2011 .
[15] Andrew G. Barto,et al. Efficient skill learning using abstraction selection , 2009, IJCAI 2009.
[16] J Glaser,et al. Separation of Concerns , 2014 .
[17] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[18] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[19] Bing Liu,et al. Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression , 2003, ICML.
[20] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[21] Peng Zhou,et al. Discovering options from example trajectories , 2009, ICML '09.
[22] Marco Colombetti,et al. Robot Shaping: An Experiment in Behavior Engineering , 1997 .
[23] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[24] Julian Togelius,et al. The 2009 Mario AI Competition , 2010, IEEE Congress on Evolutionary Computation.
[25] Peng Zang,et al. Scaling solutions to Markov Decision Problems , 2011 .
[26] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .
[27] Peter Stone,et al. Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.
[28] Daphne Koller,et al. Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.
[29] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..
[30] Craig Boutilier,et al. Exploiting Structure in Policy Construction , 1995, IJCAI.
[31] Kee-Eung Kim,et al. Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.
[32] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[33] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[34] Mitsuo Kawato,et al. Inter-module credit assignment in modular reinforcement learning , 2003, Neural Networks.
[35] Andrew G. Barto,et al. Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.
[36] Brett Browning,et al. Learning robot motion control with demonstration and advice-operators , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[37] Doina Precup,et al. Learning Options in Reinforcement Learning , 2002, SARA.
[38] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.
[39] Shie Mannor,et al. Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.
[40] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[41] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.