Pricing in Agent Economies Using Neural Networks and Multi-agent Q-Learning

This chapter has examined single-agent and multi-agent Q-learning in threemodels of a two-seller economy in which the sellers alternately take turns setting prices, and then instantaneous utilities are given to both sellers based on the current price pair. Such models fall into the category of two-players, alternating-turn, arbitrary-sum Markov games, in which both the rewards and the state space transitions are deterministic. The game is Markov because the state space is fully observable and the rewards are not history dependent. In all three models (Price-Quality, Information-Filtering and Shopbot), large amplitude cyclic price wars are obtained when the sellers myopically optimize their instantaneous utilities without regard to longer-term impact of their pricing policies. It is found that, in all three models, the use of Q-learning by one of the sellers against a myopic opponent invariably results in exact convergence to the optimal Q-function and optimal policy against that opponent, for all allowed values of the discount parameter γ. The use of the Q-derived policy yields greater expected profit for the Q-learner, with monotically increasing profit as γ increases. In many cases, it has a side benefit of enhancing social welfare by also giving greater expected profit for the myopic opponent. This comes about by reducing the amplitude of the undercutting price-war regime, or in some cases, eliminating it completely.

[1]  C. Watkins Learning from delayed rewards , 1989 .

[2]  David M. Kreps,et al.  A Course in Microeconomic Theory , 2020 .

[3]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[4]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[5]  Thomas G. Dietterich,et al.  High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.

[6]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[7]  Tuomas Sandholm,et al.  On Multiagent Q-Learning in a Semi-Competitive Domain , 1995, Adaption and Learning in Multi-Agent Systems.

[8]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[9]  Junling Hu,et al.  Self-fulfilling Bias in Multiagent Learning , 1996 .

[10]  J. Kephart,et al.  Price dynamics of vertically differentiated information markets , 1998, ICE '98.

[11]  James E. Hanson,et al.  Price-war dynamics in a free-market economy of software agents , 1998 .

[12]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[13]  G. Tesauro,et al.  Foresight-based pricing algorithms in an economy of software agents , 1998, ICE '98.

[14]  Edmund H. Durfee,et al.  Learning nested agent models in an information economy , 1998, J. Exp. Theor. Artif. Intell..

[15]  Jeffrey O. Kephart,et al.  Shopbots and Pricebots , 1999, IJCAI.

[16]  Manuela M. Veloso,et al.  Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.

[17]  Jeffrey O. Kephart,et al.  Foresight-based pricing algorithms in agent economies , 2000, Decis. Support Syst..