Learning against learning : evolutionary dynamics of reinforcement learning algorithms in strategic interactions

Imagine computer programs (agents) that learn to coordinate or to compete. This study investigates how their learning processes influence each other. Such adaptive agents already take vital roles behind the scenes of our society, e.g., high frequency automated traders participate in financial trading and create more volume than human trading in some US markets. However, many learning algorithms only have proven performance guarantees if they act alone - as soon as a second agent influences the outcomes most guarantees are invalid. This dissertation extends guarantees to strategic interactions of several agents and examines how closely algorithms approximate optimal behavior. This research was funded by a TopTalent 2008 grant of NWO.

[1]  B. Steele For More Information , 2000, Journal of the National Cancer Institute.

[2]  Peter Stone,et al.  Convergence, Targeted Optimality, and Safety in Multiagent Learning , 2010, ICML.

[3]  Peter McBurney,et al.  An evolutionary game-theoretic comparison of two double-auction market designs , 2004, AAMAS'04.

[4]  Josef Hofbauer,et al.  Evolutionary Games and Population Dynamics , 1998 .

[5]  Jan Ramon,et al.  An evolutionary game-theoretic analysis of poker strategies , 2009, Entertain. Comput..

[6]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[7]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[8]  A. Cowles Can Stock Market Forecasters Forecast , 1933 .

[9]  R. Munos,et al.  Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[10]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[11]  P. S. Sastry,et al.  Varieties of learning automata: an overview , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[12]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13]  Christian M. Ernst,et al.  Multi-armed Bandit Allocation Indices , 1989 .

[14]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[15]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[16]  Karl Tuyls,et al.  A Comparative Study of Multi-agent Reinforcement Learning Dynamics , 2010 .

[17]  H. Jaap van den Herik,et al.  Multi-agent Learning Dynamics: A Survey , 2007, CIA.

[18]  D. Cliff,et al.  Zero is Not Enough: On The Lower Limit of Agent Intelligence For Continuous Double Auction Markets† , 1997 .

[19]  Ali Hortaçsu,et al.  Winner's Curse, Reserve Prices and Endogenous Entry: Empirical Insights from Ebay Auctions , 2003 .

[20]  Karl Tuyls,et al.  An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[21]  A. Roth,et al.  Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria , 1998 .

[22]  Simon Parsons,et al.  Discovering the game in auctions , 2008 .

[23]  Michael L. Littman,et al.  A Cognitive Hierarchy Model Applied to the Lemonade Game , 2010, Interactive Decision Theory and Game Theory.

[24]  J. Huber,et al.  The value of information in a multi-agent market model , 2006, physics/0610026.

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26]  Peter McBurney,et al.  A Novel Method for Strategy Acquisition and Its Application to a Double-Auction Market Game , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[27]  Michael H. Bowling,et al.  Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[28]  Karl Tuyls,et al.  Replicator Dynamics for Multi-agent Learning: An Orthogonal Approach , 2009, ALA.

[29]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[30]  Dave Cliff,et al.  Less Than Human: Simple Adaptive Trading Agents for CDA Markets , 1998 .

[31]  Y. Mansour,et al.  Algorithmic Game Theory: Learning, Regret Minimization, and Equilibria , 2007 .

[32]  Peter Vrancx,et al.  Networks of Learning Automata and Limiting Games , 2007, Adaptive Agents and Multi-Agents Systems.

[33]  Peter McBurney,et al.  Evolutionary mechanism design: a review , 2010, Autonomous Agents and Multi-Agent Systems.

[34]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[35]  Tom Lenaerts,et al.  A selection-mutation model for q-learning in multi-agent systems , 2003, AAMAS '03.

[36]  E. Zeeman Dynamics of the evolution of animal conflicts , 1981 .

[37]  R. McAfee,et al.  Auctions and Bidding , 1986 .

[38]  D. Stauffer Life, Love and Death: Models of Biological Reproduction and Aging , 1999 .

[39]  B. Malkiel The Efficient Market Hypothesis and Its Critics , 2003 .

[40]  Simon Parsons,et al.  What evolutionary game theory tells us about multiagent learning , 2007, Artif. Intell..

[41]  John N. Tsitsiklis,et al.  Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[42]  Lihong Li,et al.  PAC model-free reinforcement learning , 2006, ICML.

[43]  Abraham Neyman,et al.  From Markov Chains to Stochastic Games , 2003 .

[44]  M. Littman,et al.  Q-learning in Two-Player Two-Action Games , 2009 .

[45]  J. Fox The Myth of the Rational Market: A History of Risk, Reward, and Delusion on Wall Street , 2009 .

[46]  J M Smith,et al.  Evolution and the theory of games , 1976 .

[47]  Rense Corten,et al.  Game Theory Evolving: A Problem-Centered Introduction to Modeling Strategic Interaction (Second Edition) by Herbert Gintis , 2009, J. Artif. Soc. Soc. Simul..

[48]  M. Sutter,et al.  Is more information always better?: Experimental financial markets with cumulative information , 2008 .

[49]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[50]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[51]  D. Cliff Minimal-Intelligence Agents for Bargaining Behaviors in Market-Based Environments , 1997 .

[52]  Victor R. Lesser,et al.  A Multiagent Reinforcement Learning Algorithm with Non-linear Dynamics , 2008, J. Artif. Intell. Res..

[54]  Bruce Bueno de Mesquita,et al.  Game Theory, Political Economy, and the Evolving Study of War and Peace , 2006, American Political Science Review.

[55]  Kumpati S. Narendra,et al.  Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[56]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[57]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[58]  P. Taylor,et al.  Evolutionarily Stable Strategies and Game Dynamics , 1978 .

[59]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[60]  G. Tesauro,et al.  Analyzing Complex Strategic Interactions in Multi-Agent Systems , 2002 .

[61]  R. Agrawal Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[62]  Karl Tuyls,et al.  Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective , 2008, J. Mach. Learn. Res..

[63]  A. Hama Predictably Irrational: The Hidden Forces That Shape Our Decisions , 2010 .

[64]  Jonathan Schaeffer,et al.  Improved Opponent Modeling in Poker , 2000 .

[65]  Karl Tuyls,et al.  Evolutionary Dynamics of Regret Minimization , 2010, ECML/PKDD.

[66]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[67]  Gerhard Weiß,et al.  Distributed reinforcement learning , 1995, Robotics Auton. Syst..

[68]  Howie Choset,et al.  Coverage for robotics – A survey of recent results , 2001, Annals of Mathematics and Artificial Intelligence.

[69]  J. Cross A Stochastic Learning Model of Economic Behavior , 1973 .

[70]  Michael Kirchler,et al.  Partial knowledge is a dangerous thing - On the value of asymmetric fundamental information in asset markets , 2010 .

[71]  Leigh Tesfatsion,et al.  Market power and efficiency in a computational electricity market with discriminatory double-auction pricing , 2001, IEEE Trans. Evol. Comput..

[72]  Karl Tuyls,et al.  Frequency adjusted multi-agent Q-learning , 2010, AAMAS.

[73]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[74]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[75]  Robert Gibbons,et al.  A primer in game theory , 1992 .

[76]  Sönke Albers,et al.  Vickrey vs. eBay: Why Second-Price Sealed-Bid Auctions Lead to More Realistic Price-Demand Functions , 2010, Int. J. Electron. Commer..

[77]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[78]  Simon Parsons,et al.  Auction Analysis by Normal Form Game Approximation , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[79]  R. Bellman A Markovian Decision Process , 1957 .

[80]  M. Hirsch,et al.  Differential Equations, Dynamical Systems, and an Introduction to Chaos , 2003 .

[81]  T. D. Schneider,et al.  Evolution of biological information. , 2000, Nucleic acids research.

[82]  Yoav Shoham,et al.  New Criteria and a New Algorithm for Learning in Multi-Agent Systems , 2004, NIPS.

[83]  Ryszard Kowalczyk,et al.  Dynamic analysis of multiagent Q-learning with ε-greedy exploration , 2009, ICML '09.

[84]  M. Thathachar,et al.  Networks of Learning Automata: Techniques for Online Stochastic Optimization , 2003 .

[85]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[86]  Michael H. Bowling,et al.  Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[87]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[88]  S. Parsons,et al.  Everything you wanted to know about double auctions , but were afraid to ( bid or ) ask , 2006 .

[89]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[90]  William H. Sandholm,et al.  Population Games And Evolutionary Dynamics , 2010, Economic learning and social evolution.

[91]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[92]  Marco Dorigo,et al.  Teamwork in Self-Organized Robot Colonies , 2009, IEEE Transactions on Evolutionary Computation.

[93]  E. Scalas,et al.  The value of information in financial markets: An agent-based simulation , 2007, 0712.2687.

[94]  David Sklansky,et al.  The Theory of Poker , 1999 .

[95]  J. Huber,et al.  `J'-shaped returns to timing advantage in access to information - Experimental evidence and a tentative explanation , 2007 .

[96]  Simon Parsons,et al.  A novel method for automatic strategy acquisition in N-player non-zero-sum games , 2006, AAMAS '06.

[97]  Dione. Brunson Super/System A Course in Power Poker , 1994 .

[98]  Jonathan Schaeffer,et al.  Approximating Game-Theoretic Optimal Strategies for Full-scale Poker , 2003, IJCAI.

[99]  Nicholas R. Jennings,et al.  Analysing Buyers' and Sellers' Strategic Interactions in Marketplaces: An Evolutionary Game Theoretic Approach , 2007, AMEC/TADA.

[100]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[101]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[102]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[103]  Peter Stone,et al.  Multiagent learning is not the answer. It is the question , 2007, Artif. Intell..

[104]  Stephen Martin,et al.  Market Power and/or Efficiency? , 1988 .

[105]  R. Weber On the Gittins Index for Multiarmed Bandits , 1992 .

[106]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[107]  Dov Monderer,et al.  A Learning Approach to Auctions , 1998 .

[108]  John Dickhaut,et al.  Price Formation in Double Auctions , 2001, E-Commerce Agents.

[109]  K. Tuyls,et al.  Lenient Frequency Adjusted Q-learning , 2010 .