暂无分享,去创建一个
Guannan Qu | Weichang Wang | Yutian Pang | Zhe Xu | Yongming Liu | Jueming Hu | Yongming Liu | Guannan Qu | Zhe Xu | Yutian Pang | Jueming Hu | Weichang Wang
[1] Carlos Guestrin,et al. Multiagent Planning with Factored MDPs , 2001, NIPS.
[2] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[3] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.
[4] Matthew E. Taylor,et al. A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.
[5] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[6] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[7] Ufuk Topcu,et al. Distributed Policy Synthesis of Multiagent Systems With Graph Temporal Logic Specifications , 2020, IEEE Transactions on Control of Network Systems.
[8] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .
[9] Krysia Broda,et al. Induction of Subgoal Automata for Reinforcement Learning , 2019, AAAI.
[10] Tom Melham,et al. DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning , 2021, AAAI.
[11] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..
[12] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.
[13] Shiqi Zhang,et al. Learning Quadruped Locomotion Policies with Reward Machines , 2021, ArXiv.
[14] Tamer Basar,et al. Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents , 2018, ICML.
[15] Sheila A. McIlraith,et al. Learning Reward Machines for Partially Observable Reinforcement Learning , 2019, NeurIPS.
[16] Sheila A. McIlraith,et al. Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning , 2020, J. Artif. Intell. Res..
[17] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[18] Lei Ying,et al. 3M-RL: Multi-Resolution, Multi-Agent, Mean-Field Reinforcement Learning for Autonomous UAV Routing , 2021, IEEE Transactions on Intelligent Transportation Systems.
[19] Ufuk Topcu,et al. Joint Inference of Reward Machines and Policies for Reinforcement Learning , 2020, ICAPS.
[20] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.
[21] Bernard Silver,et al. A Framework for Multi-Paradigmatic Learning , 1990, ML.
[22] Ufuk Topcu,et al. Reward Machines for Cooperative Multi-Agent Reinforcement Learning , 2021, AAMAS.
[23] Krysia Broda,et al. Induction and Exploitation of Subgoal Automata for Reinforcement Learning , 2021, J. Artif. Intell. Res..
[24] John N. Tsitsiklis,et al. A survey of computational complexity results in systems and control , 2000, Autom..
[25] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[26] Yongming Liu,et al. UAS Conflict Resolution Integrating a Risk-Based Operational Safety Bound as Airspace Reservation with Reinforcement Learning , 2020 .
[27] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..
[28] Régis Sabbadin,et al. Approximate Linear-Programming Algorithms for Graph-Based Markov Decision Processes , 2006, ECAI.
[29] Sheila A. McIlraith,et al. Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning , 2018, ICML.
[30] A. Heald,et al. “Stay at Home, Protect the National Health Service, Save Lives”: A cost benefit analysis of the lockdown in the United Kingdom , 2020, International journal of clinical practice.
[31] Michael L. Littman,et al. Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.
[32] Qiang Liu,et al. Variational Planning for Graph-based MDPs , 2013, NIPS.
[33] Adam Wierman,et al. Scalable Reinforcement Learning of Localized Policies for Multi-Agent Networked Systems , 2019, L4DC.
[34] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[35] Gavin Rens,et al. Learning Non-Markovian Reward Models in MDPs , 2020, ArXiv.
[36] M. Bernardo,et al. Intermittent yet coordinated regional strategies can alleviate the COVID-19 epidemic: a network model of the Italian case , 2020, 2005.07594.
[37] Shen Li,et al. Planning With Uncertain Specifications (PUnS) , 2019, IEEE Robotics and Automation Letters.