Artificial Intelligence, Algorithmic Pricing and Collusion

Pricing algorithms are increasingly replacing human decision making in real marketplaces. To inform the competition policy debate on possible consequences, we run experiments with pricing algorithms powered by Artificial Intelligence in controlled environments (computer simulations).<br><br>In particular, we study the interaction among a number of Q-learning algorithms in the context of a workhorse oligopoly model of price competition with Logit demand and constant marginal costs. We show that the algorithms consistently learn to charge supra-competitive prices, without communicating with each other. The high prices are sustained by classical collusive strategies with a finite punishment phase followed by a gradual return to cooperation. This finding is robust to asymmetries in cost or demand and to changes in the number of players.

[1]  J. Cross A Stochastic Learning Model of Economic Behavior , 1973 .

[2]  X. Vives,et al.  Price and quantity competition in a differentiated duopoly , 1984 .

[3]  E. Maskin,et al.  A Theory of Dynamic Oligopoly, II: Price Competition , 1985 .

[4]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[5]  W. Arthur Designing Economic Agents that Act Like Human Agents: A Behavioral Approach to Bounded Rationality , 1991 .

[6]  A. Roth,et al.  Learning in Extensive-Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Term* , 1995 .

[7]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[8]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[9]  A. Roth,et al.  Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria , 1998 .

[10]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[11]  R. Tyagi On the relationship between product substitutability and tacit collusion , 1999 .

[12]  E. Hopkins Two Competing Models of How People Learn in Games (first version) , 1999 .

[13]  Jeffrey O. Kephart,et al.  Strategic pricebot dynamics , 1999, EC '99.

[14]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[15]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[16]  Jeffrey O. Kephart,et al.  Pricing in Agent Economies Using Multi-Agent Q-Learning , 2002, Autonomous Agents and Multi-Agent Systems.

[17]  Jörg Oechssler,et al.  Two are few and four are many: number effects in experimental oligopolies , 2004 .

[18]  John Duffy,et al.  Agent-Based Models and Human Subject Experiments , 2004 .

[19]  Karl Tuyls,et al.  An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Andrzej Skrzypacz,et al.  Impossibility of Collusion Under Imperfect Monitoring with Flexible Production , 2005 .

[22]  Alan W. Beggs,et al.  On the convergence of reinforcement learning , 2005, J. Econ. Theory.

[23]  Ville Könönen,et al.  Dynamic pricing based on asymmetric multiagent reinforcement learning , 2006, Int. J. Intell. Syst..

[24]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[25]  Teck-Hua Ho,et al.  Self-tuning experience weighted attraction learning in games , 2007, J. Econ. Theory.

[26]  Price dynamics and collusion under short-run price commitments , 2008 .

[27]  Steven O. Kimbrough,et al.  Learning to Collude Tacitly on Production Levels by Oligopolistic Agents , 2009 .

[28]  Steven N. Durlauf,et al.  The New Palgrave: Dictionary of Economics, Volume 1 Abramovitz — collusion , 2008 .

[29]  Uzay Kaymak,et al.  Q-learning agents in a Cournot oligopoly model , 2008 .

[30]  Ryszard Kowalczyk,et al.  Dynamic analysis of multiagent Q-learning with ε-greedy exploration , 2009, ICML '09.

[31]  D. Cooper,et al.  Communication, Renegotiation, and the Scope for Collusion , 2009 .

[32]  Michael L. Littman,et al.  Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration , 2010, ICML.

[33]  Peter Vrancx,et al.  Game Theory and Multi-agent Reinforcement Learning , 2012, Reinforcement Learning.

[34]  Aram Galstyan,et al.  Dynamics of Boltzmann Q learning in two-player two-action games. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  J. Potters,et al.  Oligopoly Experiments in the Current Millennium , 2013 .

[36]  Aspiration-Based Learning in a Cournot Duopoly Model , 2013 .

[37]  A. Roth,et al.  Maximization, learning, and economic behavior , 2014, Proceedings of the National Academy of Sciences.

[38]  Patrick Andreoli-Versbach,et al.  Econometric Evidence To Target Tacit Collusion In Oligopolistic Markets , 2015 .

[39]  Ariel Ezrachi,et al.  Artificial Intelligence & Collusion: When Computers Inhibit Competition , 2015 .

[40]  Bruno Salcedo Pricing Algorithms and Tacit Collusion , 2015 .

[41]  Daniel Friedman,et al.  From imitation to collusion: Long-run learning in a low-information environment , 2012, J. Econ. Theory.

[42]  Anton J. Kleywegt,et al.  Learning and Pricing with Models That Do Not Explicitly Incorporate Competition , 2015, Oper. Res..

[43]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[44]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[45]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[46]  Hamid Sabourian,et al.  Bounded memory Folk Theorem , 2011, J. Econ. Theory.

[47]  Christo Wilson,et al.  An Empirical Analysis of Algorithmic Pricing on Amazon Marketplace , 2016, WWW.

[48]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[49]  Developing Competition Law for Collusion by Autonomous Price-Setting Agents , 2017 .

[50]  David K. Levine,et al.  Whither game theory? Towards a theory of learning in games , 2016 .

[51]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[52]  M. Mohri,et al.  Bandit Problems , 2006 .

[53]  Ulrich Schwalbe,et al.  Algorithms, Machine Learning, and Collusion , 2018 .

[54]  J. Gata Controlling Algorithmic Collusion: Short Review of the Literature, Undecidability, and Alternative Approaches , 2018, CICEE - Working Papers Series.

[55]  Emilio Calvano,et al.  Q-Learning to Cooperate∗ , 2018 .

[56]  Timo Klein,et al.  Assessing Autonomous Algorithmic Collusion: Q-Learning Under Short-Run Price Commitments , 2018 .

[57]  Niklas Horstmann,et al.  Number Effects and Tacit Collusion in Experimental Oligopolies , 2018, The Journal of Industrial Economics.

[58]  J. Harrington DEVELOPING COMPETITION LAW FOR COLLUSION BY AUTONOMOUS ARTIFICIAL AGENTS† , 2018, Journal of Competition Law & Economics.

[59]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[60]  Guillaume Fréchette,et al.  On the Determinants of Cooperation in Infinitely Repeated Games: A Survey , 2014 .

[61]  Jürgen Kurths,et al.  Deterministic limit of temporal difference reinforcement learning for stochastic games , 2018, Physical review. E.

[62]  David P. Byrne,et al.  Learning to Coordinate: A Study in Retail Gasoline , 2018, American Economic Review.

[63]  F. Decarolis,et al.  From Mad Men to Maths Men: Concentration and Buyer Power in Online Advertising , 2019, American Economic Review.