暂无分享,去创建一个
[1] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[2] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[3] Michael H. Bowling,et al. Learning to Be Cautious , 2021, ArXiv.
[4] L. C. Baird,et al. Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).
[5] Yves Breitmoser,et al. On the Beliefs Off the Path: Equilibrium Refinement Due to Quantal Response and Level-K , 2010, Games Econ. Behav..
[6] Michael H. Bowling,et al. No-Regret Learning in Extensive-Form Games with Imperfect Recall , 2012, ICML.
[7] F. Wijsbegeerte. Epistemic Game Theory , 2016 .
[8] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.
[9] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[10] Michael Bowling,et al. Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games , 2021, ICML.
[11] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[12] H. W. Kuhn. EXTENSIVE GAMES AND THE PROBLEM OF INFORMATION , 2020, Classics in Game Theory.
[13] Shlomo Zilberstein,et al. Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.
[14] Tuomas Sandholm,et al. Online Convex Optimization for Sequential Decision Processes and Extensive-Form Games , 2018, AAAI.
[15] Amy Greenwald,et al. Solving for Best Responses and Equilibria in Extensive-Form Games with Reinforcement Learning Methods , 2017 .
[16] Peter Stone,et al. Learning Predictive State Representations , 2003, ICML.
[17] Michael Bowling,et al. Hindsight and Sequential Rationality of Correlated Play , 2021, AAAI.
[18] Michael H. Bowling,et al. Rethinking Formal Models of Partially Observable Multiagent Decision Making , 2019, Artif. Intell..
[19] Zheng Li,et al. Bounds for Regret-Matching Algorithms , 2006, AI&M.
[20] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.
[21] Michael R. James,et al. Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.
[22] Michael H. Bowling,et al. Actor-Critic Policy Optimization in Partially Observable Multiagent Environments , 2018, NeurIPS.
[23] Yishay Mansour,et al. From External to Internal Regret , 2005, J. Mach. Learn. Res..