暂无分享,去创建一个
Yi Wu | Zhongqian Sun | Rui Zhao | Haifeng Hu | Yang Gao | Jinming Song | Yang Wei | Yi Wu | Rui Zhao | Jinming Song | Haifeng Hu | Yang Gao | Zhongqian Sun | Yang Wei
[1] H. Piaggio. Mathematical Analysis , 1955, Nature.
[2] D. C. Englebart,et al. Augmenting human intellect: a conceptual framework , 1962 .
[3] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[4] Claude Sammut,et al. A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.
[5] Craig Boutilier,et al. Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.
[6] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[7] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.
[8] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[9] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.
[10] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.
[11] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.
[12] Joshua B. Tenenbaum,et al. Coordinate to cooperate or compete: Abstract goals and joint intentions in social interaction , 2016, CogSci.
[13] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[14] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[15] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[16] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[17] Shan Carter,et al. Using Artificial Intelligence to Augment Human Intelligence , 2017 .
[18] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.
[19] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[20] Greg Turk,et al. Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.
[21] Alexander Peysakhovich,et al. Maintaining cooperation in complex social dilemmas using deep reinforcement learning , 2017, ArXiv.
[22] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.
[23] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[24] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.
[25] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[26] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[27] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[28] Jason Weston,et al. Vehicle Community Strategies , 2018, ArXiv.
[29] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.
[30] Atil Iscen,et al. Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.
[31] Alexander Peysakhovich,et al. Learning Social Conventions in Markov Games , 2018, ArXiv.
[32] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[33] Sergey Levine,et al. Latent Space Policies for Hierarchical Reinforcement Learning , 2018, ICML.
[34] Finale Doshi-Velez,et al. Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies , 2019, IJCAI.
[35] Sergey Levine,et al. Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.
[36] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.
[37] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[38] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[39] Joshua B. Tenenbaum,et al. Theory of Minds: Understanding Behavior in Groups Through Inverse Planning , 2019, AAAI.
[40] Rui Zhao,et al. Maximum Entropy-Regularized Multi-Goal Reinforcement Learning , 2019, ICML.
[41] Anca D. Dragan,et al. On the Utility of Learning about Humans for Human-AI Coordination , 2019, NeurIPS.
[42] Julie Shah,et al. Adversarially Guided Self-Play for Adopting Social Conventions , 2020, ArXiv.
[43] K. Choromanski,et al. Effective Diversity in Population-Based Reinforcement Learning , 2020, NeurIPS.
[44] Jakob N. Foerster,et al. "Other-Play" for Zero-Shot Coordination , 2020, ICML.
[45] Jiechao Xiong,et al. TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game , 2020, ArXiv.
[46] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[47] Pieter Abbeel,et al. Mutual Information State Intrinsic Control , 2021, ICLR.
[48] S. Du,et al. Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization , 2021, ICLR.
[49] Hengyuan Hu,et al. Trajectory Diversity for Zero-Shot Coordination , 2021, AAMAS.
[50] Sam Devlin,et al. Evaluating the Robustness of Collaborative Agents , 2021, AAMAS.