论文信息 - Robust solutions to Stackelberg games: Addressing bounded rationality and limited observations in human cognition - 字舞流文

Robust solutions to Stackelberg games: Addressing bounded rationality and limited observations in human cognition

How do we build algorithms for agent interactions with human adversaries? Stackelberg games are natural models for many important applications that involve human interaction, such as oligopolistic markets and security domains. In Stackelberg games, one player, the leader, commits to a strategy and the follower makes her decision with knowledge of the leader's commitment. Existing algorithms for Stackelberg games efficiently find optimal solutions (leader strategy), but they critically assume that the follower plays optimally. Unfortunately, in many applications, agents face human followers (adversaries) who - because of their bounded rationality and limited observation of the leader strategy - may deviate from their expected optimal response. In other words, human adversaries' decisions are biased due to their bounded rationality and limited observations. Not taking into account these likely deviations when dealing with human adversaries may cause an unacceptable degradation in the leader's reward, particularly in security applications where these algorithms have seen deployment. The objective of this paper therefore is to investigate how to build algorithms for agent interactions with human adversaries.To address this crucial problem, this paper introduces a new mixed-integer linear program (MILP) for Stackelberg games to consider human adversaries, incorporating: (i) novel anchoring theories on human perception of probability distributions and (ii) robustness approaches for MILPs to address human imprecision. Since this new approach considers human adversaries, traditional proofs of correctness or optimality are insufficient; instead, it is necessary to rely on empirical validation. To that end, this paper considers four settings based on real deployed security systems at Los Angeles International Airport (Pita et al., 2008 35]), and compares 6 different approaches (three based on our new approach and three previous approaches), in 4 different observability conditions, involving 218 human subjects playing 2960 games in total. The final conclusion is that a model which incorporates both the ideas of robustness and anchoring achieves statistically significant higher rewards and also maintains equivalent or faster solution speeds compared to existing approaches.

Sarit Kraus | Manish Jain | Milind Tambe | Fernando Ordóñez | James Pita | Milind Tambe | Sarit Kraus | F. Ordóñez | Manish Jain | J. Pita

[1] Craig Boutilier,et al. Elicitation of Factored Utilities , 2008, AI Mag..

[2] H. Simon,et al. Rational choice and the structure of the environment. , 1956, Psychological review.

[3] Li Chen,et al. User-Involved Preference Elicitation for Product Search and Recommender Systems , 2008, AI Mag..

[4] Robert T. Clemen,et al. Subjective Probability Assessment in Decision Analysis: Partition Dependence and Bias Toward the Ignorance Prior , 2005, Manag. Sci..

[5] R. McKelvey,et al. Quantal Response Equilibria for Normal Form Games , 1995 .

[6] A. Haurie,et al. Sequential Stackelberg equilibria in two-person games , 1985 .

[7] Richard W. Pew,et al. Modeling human and organizational behavior : application to military simulations , 1998 .

[8] M. Friedman. The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[9] Guy H. Walker,et al. Human Factors Methods: A Practical Guide for Engineering and Design , 2012 .

[10] A. Tversky,et al. Support theory: A nonextensional representation of subjective probability. , 1994 .

[11] Sarit Kraus,et al. Playing games for security: an efficient exact algorithm for solving Bayesian Stackelberg games , 2008, AAMAS.

[12] E.E.C. van Damme,et al. Games with imperfectly observable commitment , 1997 .

[13] Jean Cardinal,et al. Pricing of Geometric Transportation Networks , 2009, CCCG.

[14] B. Stengel,et al. Leadership with commitment to mixed strategies , 2004 .

[15] Alex Pentland,et al. Modeling and Prediction of Human Behavior , 1999, Neural Computation.

[16] Julia Kastner,et al. Introduction to Robust Estimation and Hypothesis Testing , 2005 .

[17] Fernando Ordóñez,et al. Robust Wardrop Equilibrium , 2007, NET-COOP.

[18] Milind Tambe,et al. Security and Game Theory: IRIS – A Tool for Strategic Security Allocation in Transportation Networks , 2011, AAMAS 2011.

[19] G. Leitmann. On generalized Stackelberg strategies , 1978 .

[20] Sarit Kraus,et al. Adversarial Uncertainty in Multi-Robot Patrol , 2009, IJCAI.

[21] A. Tversky,et al. Unpacking, repacking, and anchoring: advances in support theory. , 1997 .

[22] Ya'akov Gal,et al. Learning Social Preferences in Games , 2004, AAAI.

[23] Reinhard Selten,et al. Evolutionary stability in extensive two-person games - correction and further development , 1988 .

[24] Vincent Conitzer,et al. Stackelberg vs. Nash in security games: interchangeability, equivalence, and uniqueness , 2010, AAMAS.

[25] J. Neumann. Zur Theorie der Gesellschaftsspiele , 1928 .

[26] M. Dufwenberg. Game theory. , 2011, Wiley interdisciplinary reviews. Cognitive science.

[27] K. Yuen,et al. The two-sample trimmed t for unequal population variances , 1974 .

[28] Sarit Kraus,et al. Deployed ARMOR protection: the application of a game theoretic model for security at the Los Angeles International Airport , 2008, AAMAS.

[29] R FoxCraig,et al. Subjective Probability Assessment in Decision Analysis , 2005 .

[30] Herbert A. Simon,et al. The Sciences of the Artificial , 1970 .

[31] Karl Tuyls,et al. Artificial agents learning human fairness , 2008, AAMAS.

[32] Manish Jain,et al. Computing optimal randomized resource allocations for massive security games , 2009, AAMAS.

[33] Kelly E. See,et al. Between ignorance and truth: Partition dependence and learning in judgment under uncertainty. , 2006, Journal of experimental psychology. Learning, memory, and cognition.

[34] Dimitris Bertsimas,et al. Robust game theory , 2006, Math. Program..

[35] R. Selten. Reexamination of the perfectness concept for equilibrium points in extensive games , 1975, Classics in Game Theory.

[36] C. Pahl-Wostl,et al. Heuristics to characterise human behaviour in agent based models , 2004 .

[37] Sarit Kraus,et al. Effective solutions for real-world Stackelberg games: when agents must deal with human uncertainties , 2009, AAMAS.

[38] Sarit Kraus,et al. Negotiating with bounded rational agents in environments with incomplete information using an automated agent , 2008, Artif. Intell..

[39] Felix Várdy,et al. The value of commitment in Stackelberg games with observation costs , 2004, Games Econ. Behav..

[40] Colin Camerer. Behavioral Game Theory: Experiments in Strategic Interaction , 2003 .

[41] Laurent El Ghaoui,et al. Robustness in Markov Decision Problems with Uncertain Transition Matrices , 2003, NIPS.

[42] Miroslaw Truszczynski,et al. Preferences and Nonmonotonic Reasoning , 2008, AI Mag..

[43] C. Starmer. Developments in Non-expected Utility Theory: The Hunt for a Descriptive Theory of Choice under Risk , 2000 .

[44] A. Tversky,et al. Subjective Probability: A Judgment of Representativeness , 1972 .

[45] Yoav Shoham,et al. If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[46] Vincent Conitzer,et al. Computing the optimal strategy to commit to , 2006, EC '06.

[47] RICHARD C. LARSON,et al. A hypercube queuing model for facility location and redistricting in urban emergency services , 1974, Comput. Oper. Res..

[48] R. Selten,et al. A Generalized Nash Solution for Two-Person Bargaining Games with Incomplete Information , 1972 .

[49] Ya'akov Gal,et al. Predicting people's bidding behavior in negotiation , 2006, AAMAS '06.

[50] A. Tversky,et al. Prospect Theory : An Analysis of Decision under Risk Author ( s ) : , 2007 .

[51] Singiresu S. Rao,et al. Optimization Theory and Applications , 1980, IEEE Transactions on Systems, Man, and Cybernetics.

[52] S. Tijs. Nash equilibria for noncooperative n-person games in normal form , 1981 .

[53] Ariel Orda,et al. Achieving network optima using Stackelberg routing strategies , 1997, TNET.

[54] Sarit Kraus,et al. Security in multiagent systems by policy randomization , 2006, AAMAS '06.

[55] Rand R. Wilcox,et al. How many discoveries have been lost by ignoring modern statistical methods , 1998 .

[56] R. Selten. Evolutionary stability in extensive two-person games , 1983 .

[57] Sarit Kraus,et al. The impact of adversarial knowledge on adversarial planning in perimeter patrol , 2008, AAMAS.

[58] A. Rubinstein. Modeling Bounded Rationality , 1998 .

[59] Gerald G. Brown,et al. Defending Critical Infrastructure , 2006, Interfaces.

[60] K. Bagwell. Commitment and observability in games , 1995 .

[61] Judy Goldsmith,et al. Preference Handling for Artificial Intelligence , 2008, AI Mag..

[62] D. Koehler,et al. Probability matching in choice under uncertainty: Intuition versus deliberation , 2009, Cognition.

[63] Vincent Conitzer,et al. Stackelberg vs. Nash in security games: interchangeability, equivalence, and uniqueness , 2010, AAMAS 2010.

[64] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .

[65] Craig R. Fox,et al. Partition Priming in Judgment Under Uncertainty , 2003, Psychological science.

[66] Sarit Kraus,et al. Facing the challenge of human-agent negotiations via effective general opponent modeling , 2009, AAMAS.

[67] Edgar Brunner,et al. Rank-Score Tests in Factorial Designs with Repeated Measures , 1999 .