论文信息 - Using Reinforcement Learning to Validate Empirical Game-Theoretic Analysis: A Continuous Double Auction Study

Using Reinforcement Learning to Validate Empirical Game-Theoretic Analysis: A Continuous Double Auction Study

Empirical game-theoretic analysis (EGTA) has recently been applied successfully to analyze the behavior of large numbers of competing traders in a continuous double auction market. Multiagent simulation methods like EGTA are useful for studying complex strategic environments like a stock market, where it is not feasible to solve analytically for the rational behavior of each agent. A weakness of simulation-based methods in strategic settings, however, is that it is typically impossible to prove that the strategy profile assigned to the simulated agents is stable, as in a Nash equilibrium. I propose using reinforcement learning to analyze the regret of supposed Nash-equilibrium strategy profiles found by EGTA. I have developed a new library of reinforcement learning tools, which I have integrated into an extended version of the market simulator from our prior work. I provide evidence for the effectiveness of our library methods, both on a suite of benchmark problems from the literature, and on non-equilibrium strategy profiles in our market environment. Finally, I use our new reinforcement learning tools to provide evidence that the equilibria found by EGTA in our recent continuous double auction study are likely to have only negligible regret, even with respect to an extended strategy space.

Mason Wright | Mason Wright

[1] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[2] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[3] Michael P. Wellman,et al. Stronger CDA strategies through empirical game-theoretic analysis and reinforcement learning , 2009, AAMAS.

[4] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[5] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[6] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[7] Michael P. Wellman,et al. Scaling simulation-based game analysis through deviation-preserving reduction , 2012, AAMAS.

[8] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[9] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[10] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[11] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .