论文信息 - Stochastic Adaptive Dynamics of a Simple Market as a Non-Stationary Multi-Armed Bandit Problem

Stochastic Adaptive Dynamics of a Simple Market as a Non-Stationary Multi-Armed Bandit Problem

We develop a dynamic monopoly pricing model as a non-stationary multi-armed bandit problem. At each time, the monopolist chooses a price in a finite set and each customer decides stochastically but independently to visit or not his store. Each customer is characterized by two parameters, an ability-to-pay and a probability to visit. Our problem is non-stationary for the monopolist because each customer modifies his probability with experience. We define an ex-ante optimal price for our problem and then look at two different ways of learning this optimal price. In the first part, assuming the monopolist knows everything but the ability-topay, we suggest a simple counting rule based on purchase behavior which allows him to obtain enough information to compute the optimal price. In the second part, assuming no particular knowledge, we consider the case in which the monopolist uses an adaptive stochastic algorithm. When learning is easy (difficult), our simulations suggest that the monopolist (does not) choose the optimal price on each sample path.

Yann Braouezec

[1] H Robbins,et al. A SEQUENTIAL DECISION PROBLEM WITH A FINITE MEMORY. , 1956, Proceedings of the National Academy of Sciences of the United States of America.

[2] H. Simon,et al. A comparison of game theory and learning theory , 1956 .

[3] Violet R. Cane,et al. Learning and Inference , 1962 .

[4] H. Simon,et al. Theories of Decision-Making in Economics and Behavioural Science , 1966 .

[5] S. M. Samuels. Randomized Rules for the Two-Armed-Bandit with Finite Memory , 1968 .

[6] Thomas M. Cover,et al. The two-armed-bandit problem with time-invariant finite memory , 1970, IEEE Trans. Inf. Theory.

[7] Gareth Horsnell,et al. Stochastic Models of Buying Behavior , 1971 .

[8] M. Rothschild. A two-armed bandit theory of market pricing , 1974 .

[9] Richard Schmalensee,et al. Alternative models of bandit selection , 1975 .

[10] A. McLennan. Price dispersion and incomplete learning in the long run , 1984 .

[11] N. Kiefer,et al. Controlling a Stochastic Process with Unknown Parameters , 1988 .

[12] J E Staddon,et al. Stochastic choice models: A comparison between Bush-Mosteller and a source-independent reward-following model. , 1989, Journal of the experimental analysis of behavior.

[13] R. Pemantle,et al. Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations , 1990 .