Gambling in a rigged casino: The adversarial multi-armed bandit problem
暂无分享,去创建一个
Nicolò Cesa-Bianchi | Peter Auer | Yoav Freund | Robert E. Schapire | Y. Freund | R. Schapire | Nicolò Cesa-Bianchi | P. Auer | N. Cesa-Bianchi
[1] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[2] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .
[3] Journal of the Association for Computing Machinery , 1961, Nature.
[4] A. Banos. On Pseudo-Games , 1968 .
[5] J. Neveu,et al. Discrete Parameter Martingales , 1975 .
[6] N. Megiddo. On repeated games with incomplete information played by non-Bayesian players , 1980 .
[7] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .
[8] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[9] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.
[10] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[11] David Haussler,et al. How to use expert advice , 1993, STOC.
[12] Dean P. Foster,et al. A Randomization Rule for Selecting Forecasts , 1993, Oper. Res..
[13] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..
[14] P. Varaiya,et al. Multi-Armed bandit problem revisited , 1994 .
[15] D. Fudenberg,et al. Consistency and Cautious Fictitious Play , 1995 .
[16] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[17] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.
[18] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .