暂无分享,去创建一个
[1] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[2] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[3] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[4] Gabriel Kalweit,et al. Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning , 2017, CoRL.
[5] Dorian Kodelja,et al. Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.
[6] Mohammad Norouzi,et al. Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.
[7] Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.
[8] Mario Coppola,et al. MAMBPO: Sample-efficient multi-robot reinforcement learning using learned world models , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[9] Jonathan L. Shapiro,et al. Opponent Modeling by Expectation–Maximization and Sequence Prediction in Simplified Poker , 2017, IEEE Transactions on Computational Intelligence and AI in Games.
[10] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[11] Jonathan P. How,et al. Learning for multi-robot cooperation in partially observable stochastic environments with macro-actions , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[12] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[13] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.
[14] Edwin K. P. Chong,et al. Decentralized Guidance Control of UAVs with Explicit Optimization of Communication , 2014, J. Intell. Robotic Syst..
[15] Claudia V. Goldman,et al. The complexity of multiagent systems: the price of silence , 2003, AAMAS '03.
[16] Shimon Whiteson,et al. Semi-On-Policy Training for Sample Efficient Multi-Agent Policy Gradients , 2021, ArXiv.
[17] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[18] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.
[19] Sham M. Kakade,et al. Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity , 2020, NeurIPS.
[20] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[21] Pieter Abbeel,et al. Planning to Explore via Self-Supervised World Models , 2020, ICML.
[22] O. H. Brownlee,et al. ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .
[23] Igor Mordatch,et al. Multi Agent Reinforcement Learning with Multi-Step Generative Models , 2019, CoRL.
[24] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.
[25] H. Francis Song,et al. The Hanabi Challenge: A New Frontier for AI Research , 2019, Artif. Intell..
[26] Bikramjit Banerjee,et al. Multi-agent reinforcement learning as a rehearsal for decentralized planning , 2016, Neurocomputing.
[27] Nikos A. Vlassis,et al. Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..
[28] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[29] Nikos A. Vlassis,et al. The Cross-Entropy Method for Policy Search in Decentralized POMDPs , 2008, Informatica.
[30] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[31] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.
[32] Krzysztof Choromanski,et al. Ready Policy One: World Building Through Active Learning , 2020, ICML.
[33] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.
[34] Frans A. Oliehoek,et al. A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.
[35] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.
[36] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[37] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.