论文信息 - How to Minimize Maximum Regret in Repeated Decision-Making

How to Minimize Maximum Regret in Repeated Decision-Making

Consider repeated decision making in a stationary noisy environment given a nite set of actions in each round. Payo¤s belong to a known bounded interval. A rule or strategy attains minimax regret if it minimizes over all rules the maximum over all payo¤ distributions of the di¤erence between achievable and achieved discounted expected payo¤s. Linear rules that attain minimax regret are shown to exist and are optimal for a Bayesian decision-maker endowed with the prior where learning is most di¢ cult. Minimax regret behavior for choosing between two actions given small or intermediate discount factors is derived and only requires two rounds of memory. JEL classication: D81, D83.

Karl H. Schlag | K. Schlag

[1] Leonard J. Savage,et al. The Theory of Statistical Decision , 1951 .

[2] J. Isbell. On a Problem of Robbins , 1959 .

[3] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[4] H. Simon,et al. Models of Bounded Rationality: Empirically Grounded Economic Reason , 1997 .

[5] H Robbins,et al. A SEQUENTIAL DECISION PROBLEM WITH A FINITE MEMORY. , 1956, Proceedings of the National Academy of Sciences of the United States of America.

[6] Antonio J. Morales,et al. Expedient and Monotone Learning Rules , 2004 .

[7] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .

[8] S. French,et al. Decision Theory: An Introduction to the Mathematics of Rationality. , 1988 .

[9] Gary Chamberlain,et al. Econometrics and decision theory , 2000 .

[10] S. M. Samuels. Randomized Rules for the Two-Armed-Bandit with Finite Memory , 1968 .

[11] M. L. Tsetlin. On the Behavior of Finite Automata in Random Media , 1961 .

[12] Abraham Wald,et al. Statistical Decision Functions , 1951 .

[13] A Note on Discounted Future Two-Armed Bandits , 1983 .

[14] I. Gilboa,et al. Maxmin Expected Utility with Non-Unique Prior , 1989 .