论文信息 - Teaching with Rewards and Punishments: Reinforcement or Communication?

Teaching with Rewards and Punishments: Reinforcement or Communication?

Teaching with evaluative feedback involves expectations about how a learner will interpret rewards and punishments. We formalize two hypotheses of how a teacher implicitly expects a learner to interpret feedback – a reward-maximizing model based on standard reinforcement learning and an action-feedback model based on research on communicative intent – and describe a virtual animal-training task that distinguishes the two. The results of two experiments in which people gave learners feedback for isolated actions (Exp. 1) or while learning over time (Exp. 2) support the action-feedback model over the reward-maximizing model.

[1] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[2] G. Csibra,et al. Natural pedagogy , 2009, Trends in Cognitive Sciences.

[3] P. Dayan,et al. Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[4] David L. Roberts,et al. A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback , 2014, AAAI.

[5] R. Bellman. A Markovian Decision Process , 1957 .

[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7] Noah D. Goodman,et al. A rational account of pedagogical reasoning: Teaching by, and learning from, examples , 2014, Cognitive Psychology.

[8] Ines Gloeckner,et al. Relevance Communication And Cognition , 2016 .

[9] R. Heyman,et al. The Effect of Praise, Positive Nonverbal Response, Reprimand, and Negative Nonverbal Response on Child Compliance: A Systematic Review , 2012, Clinical Child and Family Psychology Review.