Pricing in Agent Economies Using Multi-Agent Q-Learning

This paper investigates how adaptive software agents may utilize reinforcement learning algorithms such as Q-learning to make economic decisions such as setting prices in a competitive marketplace. For a single adaptive agent facing fixed-strategy opponents, ordinary Q-learning is guaranteed to find the optimal policy. However, for a population of agents each trying to adapt in the presence of other adaptive agents, the problem becomes non-stationary and history dependent, and it is not known whether any global convergence will be obtained, and if so, whether such solutions will be optimal. In this paper, we study simultaneous Q-learning by two competing seller agents in three moderately realistic economic models. This is the simplest case in which interesting multi-agent phenomena can occur, and the state space is small enough so that lookup tables can be used to represent the Q-functions. We find that, despite the lack of theoretical guarantees, simultaneous convergence to self-consistent optimal solutions is obtained in each model, at least for small values of the discount parameter. In some cases, exact or approximate convergence is also found even at large discount parameters. We show how the Q-derived policies increase profitability and damp out or eliminate cyclic price “wars” compared to simpler policies based on zero lookahead or short-term lookahead. In one of the models (the “Shopbot” model) where the sellers' profit functions are symmetric, we find that Q-learning can produce either symmetric or broken-symmetry policies, depending on the discount parameter and on initial conditions.

[1]  C. Watkins Learning from delayed rewards , 1989 .

[2]  David M. Kreps,et al.  A Course in Microeconomic Theory , 2020 .

[3]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[4]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[5]  Thomas G. Dietterich,et al.  High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.

[6]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[7]  Tuomas Sandholm,et al.  On Multiagent Q-Learning in a Semi-Competitive Domain , 1995, Adaption and Learning in Multi-Agent Systems.

[8]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[9]  J. Kephart,et al.  Price dynamics of vertically differentiated information markets , 1998, ICE '98.

[10]  James E. Hanson,et al.  Price-war dynamics in a free-market economy of software agents , 1998 .

[11]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[12]  G. Tesauro,et al.  Foresight-based pricing algorithms in an economy of software agents , 1998, ICE '98.

[13]  Edmund H. Durfee,et al.  Learning nested agent models in an information economy , 1998, J. Exp. Theor. Artif. Intell..

[14]  Jeffrey O. Kephart,et al.  Shopbots and Pricebots , 1999, IJCAI.

[15]  Jeffrey O. Kephart,et al.  Foresight-based pricing algorithms in agent economies , 2000, Decis. Support Syst..

[16]  Manu Sridharan,et al.  Multi-agent Q-learning and regression trees for automated pricing decisions , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[17]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[18]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.