Convergent Actor Critic by Humans
暂无分享,去创建一个
David L. Roberts | Matthew E. Taylor | Michael L. Littman | Bei Peng | James MacGlashan | Robert Loftin | M. Littman | J. MacGlashan | R. Loftin | Bei Peng | D. Roberts
[1] David L. Roberts,et al. Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning , 2015, Autonomous Agents and Multi-Agent Systems.
[2] Peter Stone,et al. Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.
[3] Fiery Cushman,et al. Teaching with Rewards and Punishments: Reinforcement or Communication? , 2015, CogSci.
[4] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[5] Andrea Lockerd Thomaz,et al. Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.
[6] David L. Roberts,et al. A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans , 2016, AAMAS.
[7] Andrea Lockerd Thomaz,et al. Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.
[8] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[9] John Loch,et al. Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.
[10] W. Bradley Knox,et al. Learning from human-generated reward , 2012 .
[11] Eduardo F. Morales,et al. Dynamic Reward Shaping: Training a Robot by Voice , 2010, IBERAMIA.
[12] Cynthia Breazeal,et al. Training a Robot via Human Feedback: A Case Study , 2013, ICSR.