Modeling Friends and Foes

How can one detect friendly and adversarial behavior from raw data? Detecting whether an environment is a friend, a foe, or anything in between, remains a poorly understood yet desirable ability for safe and robust agents. This paper proposes a definition of these environmental "attitudes" based on an characterization of the environment's ability to react to the agent's private strategy. We define an objective function for a one-shot game that allows deriving the environment's probability distribution under friendly and adversarial assumptions alongside the agent's optimal strategy. Furthermore, we present an algorithm to compute these equilibrium strategies, and show experimentally that both friendly and adversarial environments possess non-trivial optimal strategies.

[1]  Murray L Weidenbaum,et al.  Learning to compete , 1986 .

[2]  R. McKelvey,et al.  Quantal Response Equilibria for Normal Form Games , 1995 .

[3]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[4]  Naftali Tishby,et al.  Data Clustering by Markovian Relaxation and the Information Bottleneck Method , 2000, NIPS.

[5]  Kee-Eung Kim,et al.  Reactive bandits with attitude , 2015, AISTATS.

[6]  Daniel A. Braun,et al.  Information, Utility and Bounded Rationality , 2011, AGI.

[7]  Aleksandrs Slivkins,et al.  One Practical Algorithm for Both Stochastic and Adversarial Bandits , 2014, ICML.

[8]  Eckehard Olbrich,et al.  Hysteresis Effects of Changing Parameters of Noncooperative Games , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[10]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[11]  Peter Auer,et al.  An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits , 2016, COLT.

[12]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[13]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[14]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[15]  Sebastian Nowozin,et al.  The Numerics of GANs , 2017, NIPS.

[16]  Stuart J. Russell Rationality and Intelligence , 1995, IJCAI.

[17]  Yoav Shoham,et al.  New Criteria and a New Algorithm for Learning in Multi-Agent Systems , 2004, NIPS.

[18]  J. Neumann,et al.  Theory of games and economic behavior, 2nd rev. ed. , 1947 .

[19]  Hilbert J. Kappen,et al.  Risk Sensitive Path Integral Control , 2010, UAI.

[20]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[21]  Laurent Orseau,et al.  AI Safety Gridworlds , 2017, ArXiv.

[22]  Thore Graepel,et al.  The Mechanics of n-Player Differentiable Games , 2018, ICML.

[23]  Daniel Polani,et al.  Information Theory of Decisions and Actions , 2011 .

[24]  B. Jones BOUNDED RATIONALITY , 1999 .

[25]  Pablo Hernandez-Leal,et al.  Learning against sequential opponents in repeated stochastic games , 2017 .

[26]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[27]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[28]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29]  Gal Chechik,et al.  Information Bottleneck for Gaussian Variables , 2003, J. Mach. Learn. Res..

[30]  Jordi Grau-Moya,et al.  Bounded Rationality, Abstraction, and Hierarchical Decision-Making: An Information-Theoretic Optimality Principle , 2015, Front. Robot. AI.

[31]  Daniel A. Braun,et al.  Thermodynamics as a theory of decision-making with information-processing costs , 2012, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[32]  J. Zico Kolter,et al.  What game are we playing? End-to-end learning in normal and extensive form games , 2018, IJCAI.

[33]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[34]  Michael A. Goodrich,et al.  Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning , 2011, Machine Learning.

[35]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[36]  Aleksandrs Slivkins,et al.  25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .