论文信息 - Modeling Humans as Reinforcement Learners: How to Predict Human Behavior in Multi-Stage Games

Modeling Humans as Reinforcement Learners: How to Predict Human Behavior in Multi-Stage Games

This paper introduces a novel framework for modeling interacting humans in a multi-stage game environment by combining concepts from game theory and reinforcement learning. The proposed model has the following desirable characteristics: (1) Bounded rational players, (2) strategic (i.e., players account for one anothers reward functions), and (3) is computationally feasible even on moderately large real-world systems. To do this we extend level-K reasoning to policy space to, for the first time, be able to handle multiple time steps. This allows us to decompose the problem into a series of smaller ones where we can apply standard reinforcement learning algorithms. We investigate these ideas in a cyber-battle scenario over a smart power grid and discuss the relationship between the behavior predicted by our model and what one might expect of real human defenders and attackers.

[1] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[2] S. Shankar Sastry,et al. Research Challenges for the Security of Control Systems , 2008, HotSec.

[3] Michael Chertkov,et al. Options for Control of Reactive Power by Distributed Photovoltaic Generators , 2010, Proceedings of the IEEE.

[4] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5] Kevin Tomsovic,et al. Designing the Next Generation of Real-Time Control, Communication, and Computations for Large Power Systems , 2005, Proceedings of the IEEE.

[6] D. Stahl,et al. On Players' Models of Other Players: Theory and Experimental Evidence , 1995 .

[7] Miguel A. Costa-Gomes,et al. Cognition and Behavior in Two-Person Guessing Games: An Experimental Study , 2003 .

[8] R. McKelvey,et al. Quantal Response Equilibria for Extensive Form Games , 1998 .

[9] Alex M. Andrew,et al. Reinforcement Learning: : An Introduction , 1998 .

[10] David H. Wolpert,et al. Game Theoretic Modeling of Pilot Behavior During Mid-Air Encounters , 2011, ArXiv.