A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback
暂无分享,去创建一个
David L. Roberts | Matthew E. Taylor | Michael L. Littman | Bei Peng | James MacGlashan | Jeff Huang | Robert Tyler Loftin | M. Littman | Jeff Huang | J. MacGlashan | R. Loftin | Bei Peng | D. Roberts
[1] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[2] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[3] Peter Stone,et al. A social reinforcement learning agent , 2001, AGENTS '01.
[4] Jeffrey Heer,et al. Presiding over accidents: system direction of human action , 2004, CHI.
[5] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[6] Andrea Lockerd Thomaz,et al. Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.
[7] Paul R. Schrater,et al. Bayesian modeling of human sequential decision-making on the multi-armed bandit problem , 2008 .
[8] Peter Stone,et al. Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.
[9] Martin Pál,et al. Contextual Multi-Armed Bandits , 2010, AISTATS.
[10] Bilge Mutlu,et al. How Do Humans Teach: On Curriculum Learning and Teaching Dimension , 2011, NIPS.
[11] Christopher M. Anderson. Ambiguity aversion in multi-armed bandit problems , 2012 .
[12] Manuel Lopes,et al. Algorithmic and Human Teaching of Sequential Decision Tasks , 2012, AAAI.
[13] Bradley C. Love,et al. A New Experimental Perspective , 2012 .
[14] Andrea Lockerd Thomaz,et al. Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.