Regret Minimization Algorithms for the Followers Behaviour Identification in Leadership Games

We study for the first time, a leadership game in which one agent, acting as leader, faces another agent, acting as follower, whose behaviour is not known a priori by the leader, being one among a set of possible behavioural profiles. The main motivation is that in real-world applications the common game-theoretical assumption of perfect rationality is rarely met, and any specific assumption on bounded rationality models, if wrong, could lead to a significant loss for the leader. The question we pose is whether and how the leader can learn the behavioural profile of a follower in leadership games. This is a “natural” online identification problem: in fact, the leader aims at identifying the follower’s behavioural profile to exploit at best the potential non-rationality of the opponent, while minimizing the regret due to the initial lack of information. We propose two algorithms based on different approaches and we provide a regret analysis. Furthermore, we experimentally evaluate the pseudo-regret of the algorithms in concrete leadership games, showing that our algorithms outperform the online learning algorithms available in the state of the art.

[1]  Milind Tambe,et al.  GUARDS: game theoretic security allocation on a national scale , 2011, AAMAS.

[2]  Milind Tambe,et al.  Online planning for optimal protector strategies in resource conservation games , 2014, AAMAS.

[3]  Vincent Conitzer,et al.  Computing the optimal strategy to commit to , 2006, EC '06.

[4]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[5]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[6]  Rong Yang,et al.  Improving Resource Allocation Strategy against Human Adversaries in Security Games , 2011, IJCAI.

[7]  Maria-Florina Balcan,et al.  Commitment Without Regrets: Online Learning in Stackelberg Security Games , 2015, EC.

[8]  Viliam Lisý,et al.  Online Learning Methods for Border Patrol Resource Allocation , 2014, GameSec.

[9]  Gerhard Weiss,et al.  Multiagent Learning: Basics, Challenges, and Prospects , 2012, AI Mag..

[10]  Bo An,et al.  Security games with surveillance cost and optimal timing of attack execution , 2013, AAMAS.

[11]  Milind Tambe,et al.  When Security Games Go Green: Designing Defender Strategies to Prevent Poaching and Illegal Fishing , 2015, IJCAI.

[12]  Rong Yang,et al.  Adaptive resource allocation for wildlife protection against illegal poachers , 2014, AAMAS.

[13]  D. McFadden Econometric analysis of qualitative response models , 1984 .

[14]  Rong Yang,et al.  PAWS: adaptive game-theoretic patrolling for wildlife protection , 2014, AAMAS.

[15]  Bhaskar Krishnamachari,et al.  Restless Poachers: Handling Exploration-Exploitation Tradeoffs in Security Domains , 2016, AAMAS.

[16]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[17]  M. Dufwenberg Game theory. , 2011, Wiley interdisciplinary reviews. Cognitive science.

[18]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[19]  Sarit Kraus,et al.  Playing games for security: an efficient exact algorithm for solving Bayesian Stackelberg games , 2008, AAMAS.

[20]  H. Stackelberg,et al.  Marktform und Gleichgewicht , 1935 .

[21]  Nicola Basilico,et al.  Adversarial patrolling with spatially uncertain alarm signals , 2015, Artif. Intell..

[22]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[23]  Amos Azaria,et al.  Analyzing the Effectiveness of Adversary Modeling in Security Games , 2013, AAAI.

[24]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[25]  Milind Tambe,et al.  Security and Game Theory: IRIS – A Tool for Strategic Security Allocation in Transportation Networks , 2011, AAMAS 2011.

[26]  Ariel D. Procaccia,et al.  Learning to Play Stackelberg Security Games , 2015 .

[27]  Nicholas R. Jennings,et al.  Playing Repeated Security Games with No Prior Knowledge , 2016, AAMAS.

[28]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[29]  Sarit Kraus,et al.  Deployed ARMOR protection: the application of a game theoretic model for security at the Los Angeles International Airport , 2008, AAMAS.