暂无分享,去创建一个
[1] Murray L Weidenbaum,et al. Learning to compete , 1986 .
[2] R. McKelvey,et al. Quantal Response Equilibria for Normal Form Games , 1995 .
[3] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[4] Naftali Tishby,et al. Data Clustering by Markovian Relaxation and the Information Bottleneck Method , 2000, NIPS.
[5] Kee-Eung Kim,et al. Reactive bandits with attitude , 2015, AISTATS.
[6] Daniel A. Braun,et al. Information, Utility and Bounded Rationality , 2011, AGI.
[7] Aleksandrs Slivkins,et al. One Practical Algorithm for Both Stochastic and Adversarial Bandits , 2014, ICML.
[8] Eckehard Olbrich,et al. Hysteresis Effects of Changing Parameters of Noncooperative Games , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.
[9] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.
[10] Inderjit S. Dhillon,et al. Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..
[11] Peter Auer,et al. An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits , 2016, COLT.
[12] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[13] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[14] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.
[15] Sebastian Nowozin,et al. The Numerics of GANs , 2017, NIPS.
[16] Stuart J. Russell. Rationality and Intelligence , 1995, IJCAI.
[17] Yoav Shoham,et al. New Criteria and a New Algorithm for Learning in Multi-Agent Systems , 2004, NIPS.
[18] J. Neumann,et al. Theory of games and economic behavior, 2nd rev. ed. , 1947 .
[19] Hilbert J. Kappen,et al. Risk Sensitive Path Integral Control , 2010, UAI.
[20] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[21] Laurent Orseau,et al. AI Safety Gridworlds , 2017, ArXiv.
[22] Thore Graepel,et al. The Mechanics of n-Player Differentiable Games , 2018, ICML.
[23] Daniel Polani,et al. Information Theory of Decisions and Actions , 2011 .
[24] B. Jones. BOUNDED RATIONALITY , 1999 .
[25] Pablo Hernandez-Leal,et al. Learning against sequential opponents in repeated stochastic games , 2017 .
[26] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .
[27] Keith B. Hall,et al. Correlated Q-Learning , 2003, ICML.
[28] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[29] Gal Chechik,et al. Information Bottleneck for Gaussian Variables , 2003, J. Mach. Learn. Res..
[30] Jordi Grau-Moya,et al. Bounded Rationality, Abstraction, and Hierarchical Decision-Making: An Information-Theoretic Optimality Principle , 2015, Front. Robot. AI.
[31] Daniel A. Braun,et al. Thermodynamics as a theory of decision-making with information-processing costs , 2012, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.
[32] J. Zico Kolter,et al. What game are we playing? End-to-end learning in normal and extensive form games , 2018, IJCAI.
[33] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.
[34] Michael A. Goodrich,et al. Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning , 2011, Machine Learning.
[35] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.
[36] Aleksandrs Slivkins,et al. 25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .