Learning in Network Contexts: Experimental Results from Simulations

This paper describes the results of simulation experiments performed on a suite of learning algorithms. We focus on games in {\em network contexts}. These are contexts in which (1) agents have very limited information about the game; users do not know their own (or any other agent's) payoff function, they merely observe the outcome of their play. (2) Play can be extremely asynchronous; players update their strategies at very different rates. There are many proposed learning algorithms in the literature. We choose a small sampling of such algorithms and use numerical simulation to explore the nature of asymptotic play. In particular, we explore the extent to which the asymptotic play depends on three factors, namely: limited information, asynchronous play, and the degree of responsiveness of the learning algorithm.

[1]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[2]  A. Roth,et al.  Learning in Extensive-Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Term* , 1995 .

[3]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[4]  Robert W. Rosenthal,et al.  A note on robustness of equilibria with respect to commitment opportunities , 1991 .

[5]  E. Friedman Dynamics and Rationality in Ordered Externality Games , 1996 .

[6]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .

[7]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[8]  D. Fudenberg,et al.  Conditional Universal Consistency , 1999 .

[9]  R. Vohra,et al.  Calibrated Learning and Correlated Equilibrium , 1996 .

[10]  Robert H. Crites,et al.  Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[11]  B. Bernheim Rationalizable Strategic Behavior , 1984 .

[12]  David Pearce Rationalizable Strategic Behavior and the Problem of Perfection , 1984 .

[13]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[14]  Lutz Kilian,et al.  Asynchronicity and Learning in Cost Sharing Mechanisms I Am Indebted to Scott Shenker for His Many Helpful Comments and Suggestions. I Thank , 1997 .

[15]  A. Greenwald Learning to Play Network Games , 1998 .

[16]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[17]  E. Kalai,et al.  Rational Learning Leads to Nash Equilibrium , 1993 .

[18]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[19]  Bernardo A. Huberman,et al.  A methodology for managing risk in electronic transactions over the Internet , 2000 .

[20]  Neil D. Pearson,et al.  Consumption and Portfolio Policies With Incomplete Markets and Short‐Sale Constraints: the Finite‐Dimensional Case , 1991 .

[21]  Dean P. Foster,et al.  A Randomization Rule for Selecting Forecasts , 1993, Oper. Res..

[22]  A. Banos On Pseudo-Games , 1968 .

[23]  Allan Borodin,et al.  Online computation and competitive analysis , 1998 .

[24]  T. Cover Universal Portfolios , 1996 .

[25]  Dilip Mookherjee,et al.  Learning and Decision Costs in Experimental Constant Sum Games , 1997 .

[26]  Joel Watson,et al.  A ‘Reputation’ Refinement without Equilibrium , 1993 .

[27]  Drew Fudenberg,et al.  Learning in Games , 1998 .

[28]  D. Fudenberg,et al.  Consistency and Cautious Fictitious Play , 1995 .

[29]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[30]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[31]  N. Megiddo On repeated games with incomplete information played by non-Bayesian players , 1980 .

[32]  Kumpati S. Narendra,et al.  Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[33]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[34]  Eric J. Friedman,et al.  Learning and Implementation on the Internet , 1997 .

[35]  B A Huberman,et al.  Evolutionary games and computer simulations. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Paul R. Milgrom,et al.  Adaptive and sophisticated learning in normal form games , 1991 .

[37]  Scott Shenker,et al.  Making greed work in networks: a game-theoretic analysis of switch service disciplines , 1995, TNET.

[38]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[39]  L. S. Shapley,et al.  17. A Value for n-Person Games , 1953 .

[40]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[41]  Dean P. Foster,et al.  Regret in the On-Line Decision Problem , 1999 .

[42]  Jeffrey O. Kephart,et al.  Shopbots and Pricebots , 1999, IJCAI.

[43]  Eric J. Friedman,et al.  Learnability of a class of Non-atomic Games arising on the Internet , 1998 .

[44]  D. Fudenberg,et al.  Reputation and Equilibrium Selection in Games with a Patient Player , 1989 .

[45]  Moshe Tennenholtz,et al.  Co-Learning and the Evolution of Social Acitivity , 1994 .