Conditional Universal Consistency

Players choose an action before learning an outcome chosen according to an unknown and history-dependent stochastic rule. Procedures that categorize outcomes, and use a randomized variation on fictitious play within each category are studied. These procedures are “conditionally consistent:†they yield almost as high a time-average payoff as if the player knew the conditional distributions of actions given categories. Moreover, given any alternative procedure, there is a conditionally consistent procedure whose performance is no more than epsilon worse regardless of the discount factor. We also discuss cycles, and argue that the time-average of play should resemble a correlated equilibrium.

[1]  M. Hirsch,et al.  Mixed Equilibria and Dynamical Systems Arising from Fictitious Play in Perturbed Games , 1999 .

[2]  D. Fudenberg,et al.  An Easier Way to Calibrate , 1999 .

[3]  E. Kalai,et al.  Calibrated Forecasting and Merging , 1999 .

[4]  Dean P. Foster,et al.  Regret in the On-Line Decision Problem , 1999 .

[5]  Drew Fudenberg,et al.  Learning in Games , 1998 .

[6]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[7]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[8]  Dean P. Foster,et al.  Calibrated Learning and Correlated Equilibrium , 1997 .

[9]  C. Sanchirico A Probabilistic Model of Learning in Games , 1996 .

[10]  Masaki Aoyagi,et al.  Evolution of Beliefs and the Nash Equilibrium of Normal Form Games , 1996 .

[11]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[12]  H. Young,et al.  Learning dynamics in games with stochastic perturbations , 1995 .

[13]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[14]  D. Fudenberg,et al.  Consistency and Cautious Fictitious Play , 1995 .

[15]  Doron Sonsino Learning to learn, pattern recognition, and Nash equilibrium , 1995 .

[16]  Thomas H. Chung,et al.  Approximate methods for sequential decision making using expert advice , 1994, COLT '94.

[17]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[18]  Manfred K. Warmuth,et al.  Using experts for predicting continuous outcomes , 1994, European Conference on Computational Learning Theory.

[19]  J. Jordan Three Problems in Learning Mixed-Strategy Nash Equilibria , 1993 .

[20]  David M. Kreps,et al.  Learning Mixed Equilibria , 1993 .

[21]  D. Fudenberg,et al.  Self-confirming equilibrium , 1993 .

[22]  Neri Merhav,et al.  Universal prediction of individual sequences , 1992, IEEE Trans. Inf. Theory.

[23]  D. W. Lewis Matrix theory , 1991 .

[24]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[25]  Alfredo De Santis,et al.  Learning probabilistic prediction functions , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[26]  A. Dawid Comment: The Impossibility of Inductive Inference , 1985 .

[27]  David Oakes,et al.  Self-Calibrating Priors Do Not Exist , 1985 .

[28]  A. Dawid The Well-Calibrated Bayesian , 1982 .

[29]  N. Megiddo On repeated games with incomplete information played by non-Bayesian players , 1980 .

[30]  S. Vajda Some topics in two-person games , 1971 .

[31]  A. Banos On Pseudo-Games , 1968 .

[32]  A. W. Tucker,et al.  Advances in game theory , 1964 .

[33]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .