Exploiting exploration strategies in repeated normal form security games
暂无分享,去创建一个
[1] Ali E. Abbas,et al. Maximum Entropy Utility , 2004, Oper. Res..
[2] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[3] E. Rasmussen. Games and Information , 1989 .
[4] Yoav Shoham,et al. Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .
[5] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .
[6] H P Young,et al. On the impossibility of predicting the behavior of rational agents , 2001, Proceedings of the National Academy of Sciences of the United States of America.
[7] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[8] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[9] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 1985 .
[10] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[11] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[12] E. Jaynes. On the rationale of maximum-entropy methods , 1982, Proceedings of the IEEE.