Using Reinforcement Learning to Validate Empirical Game-Theoretic Analysis: A Continuous Double Auction Study

Empirical game-theoretic analysis (EGTA) has recently been applied successfully to analyze the behavior of large numbers of competing traders in a continuous double auction market. Multiagent simulation methods like EGTA are useful for studying complex strategic environments like a stock market, where it is not feasible to solve analytically for the rational behavior of each agent. A weakness of simulation-based methods in strategic settings, however, is that it is typically impossible to prove that the strategy profile assigned to the simulated agents is stable, as in a Nash equilibrium. I propose using reinforcement learning to analyze the regret of supposed Nash-equilibrium strategy profiles found by EGTA. I have developed a new library of reinforcement learning tools, which I have integrated into an extended version of the market simulator from our prior work. I provide evidence for the effectiveness of our library methods, both on a suite of benchmark problems from the literature, and on non-equilibrium strategy profiles in our market environment. Finally, I use our new reinforcement learning tools to provide evidence that the equilibria found by EGTA in our recent continuous double auction study are likely to have only negligible regret, even with respect to an extended strategy space.

[1]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[2]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[3]  Michael P. Wellman,et al.  Stronger CDA strategies through empirical game-theoretic analysis and reinforcement learning , 2009, AAMAS.

[4]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[5]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[6]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[7]  Michael P. Wellman,et al.  Scaling simulation-based game analysis through deviation-preserving reduction , 2012, AAMAS.

[8]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[9]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[10]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[11]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[12]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[13]  Dhananjay K. Gode,et al.  Allocative Efficiency of Markets with Zero-Intelligence Traders: Market as a Partial Substitute for Individual Rationality , 1993, Journal of Political Economy.

[14]  Alex M. Andrew,et al.  Reinforcement Learning: : An Introduction , 1998 .

[15]  John Loch,et al.  Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.

[16]  Rajarshi Das,et al.  High-performance bidding agents for the continuous double auction , 2001, EC '01.

[17]  Robert Babuska,et al.  Experience Replay for Real-Time Reinforcement Learning Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[18]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[19]  Michael P. Wellman Methods for Empirical Game-Theoretic Analysis , 2006, AAAI.

[20]  Peter Stone,et al.  Function Approximation via Tile Coding: Automating Parameter Choice , 2005, SARA.

[21]  J. Dickhaut,et al.  Price Formation in Double Auctions , 1998 .

[22]  P. Taylor,et al.  Evolutionarily Stable Strategies and Game Dynamics , 1978 .