Stochastic Adaptive Dynamics of a Simple Market as a Non-Stationary Multi-Armed Bandit Problem

We develop a dynamic monopoly pricing model as a non-stationary multi-armed bandit problem. At each time, the monopolist chooses a price in a finite set and each customer decides stochastically but independently to visit or not his store. Each customer is characterized by two parameters, an ability-to-pay and a probability to visit. Our problem is non-stationary for the monopolist because each customer modifies his probability with experience. We define an ex-ante optimal price for our problem and then look at two different ways of learning this optimal price. In the first part, assuming the monopolist knows everything but the ability-topay, we suggest a simple counting rule based on purchase behavior which allows him to obtain enough information to compute the optimal price. In the second part, assuming no particular knowledge, we consider the case in which the monopolist uses an adaptive stochastic algorithm. When learning is easy (difficult), our simulations suggest that the monopolist (does not) choose the optimal price on each sample path.

[1]  H Robbins,et al.  A SEQUENTIAL DECISION PROBLEM WITH A FINITE MEMORY. , 1956, Proceedings of the National Academy of Sciences of the United States of America.

[2]  H. Simon,et al.  A comparison of game theory and learning theory , 1956 .

[3]  Violet R. Cane,et al.  Learning and Inference , 1962 .

[4]  H. Simon,et al.  Theories of Decision-Making in Economics and Behavioural Science , 1966 .

[5]  S. M. Samuels Randomized Rules for the Two-Armed-Bandit with Finite Memory , 1968 .

[6]  Thomas M. Cover,et al.  The two-armed-bandit problem with time-invariant finite memory , 1970, IEEE Trans. Inf. Theory.

[7]  Gareth Horsnell,et al.  Stochastic Models of Buying Behavior , 1971 .

[8]  M. Rothschild A two-armed bandit theory of market pricing , 1974 .

[9]  Richard Schmalensee,et al.  Alternative models of bandit selection , 1975 .

[10]  A. McLennan Price dispersion and incomplete learning in the long run , 1984 .

[11]  N. Kiefer,et al.  Controlling a Stochastic Process with Unknown Parameters , 1988 .

[12]  J E Staddon,et al.  Stochastic choice models: A comparison between Bush-Mosteller and a source-independent reward-following model. , 1989, Journal of the experimental analysis of behavior.

[13]  R. Pemantle,et al.  Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations , 1990 .

[14]  B. Jullien,et al.  OPTIMAL LEARNING BY EXPERIMENTATION , 1991 .

[15]  Jacques Lesourne The Economics of Order and Disorder: The Market as Organizer and Creator , 1992 .

[16]  W. Brian Arthur,et al.  On designing economic agents that behave like human agents , 1993 .

[17]  Jörgen W. Weibull,et al.  Evolutionary Game Theory , 1996 .

[18]  W. Arthur Inductive Reasoning and Bounded Rationality , 1994 .

[19]  R. Nagel Unraveling in Guessing Games: An Experimental Study , 1995 .

[20]  Martin Posch,et al.  Cycling in a stochastic learning algorithm for normal form games , 1997 .

[21]  Mark A. Olson,et al.  An experimental analysis of the bandit problem , 1997 .

[22]  G. Weisbuch,et al.  Market Organisation and Trading Relationships , 2000 .

[23]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[24]  Nir Vulkan An Economist's Perspective on Probability Matching , 2000 .

[25]  Jean-François Laslier,et al.  A Behavioral Learning Process in Games , 2001, Games Econ. Behav..

[26]  P. Tarres,et al.  When can the two-armed bandit algorithm be trusted? , 2004, math/0407128.

[27]  Martin Posch,et al.  Attainability of boundary points under reinforcement learning , 2005, Games Econ. Behav..

[28]  Alan W. Beggs,et al.  On the convergence of reinforcement learning , 2005, J. Econ. Theory.

[29]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[30]  E. Hopkins Adaptive learning models of consumer behavior , 2007 .

[31]  Angela J. Yu,et al.  Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[32]  A survey of random processes with reinforcement , 2007, math/0610076.

[33]  R. Aumann,et al.  Unraveling in Guessing Games : An Experimental Study , 2007 .

[34]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[35]  Roderich Groß,et al.  Simple learning rules to cope with changing environments , 2008, Journal of The Royal Society Interface.