Learning Diverse Risk Preferences in Population-based Self-play
暂无分享,去创建一个
Xiaoteng Ma | Yiqin Yang | Chenghao Li | Qianchuan Zhao | Bin Liang | Jun Yang | Qihan Liu | Y. Jiang
[1] Chao Yu,et al. Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased , 2023, ICLR.
[2] Dong Yan,et al. Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk , 2022, IJCAI.
[3] Zihan Zhou,et al. Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization , 2022, ICLR.
[4] N. Heess,et al. NeuPL: Neural Population Learning , 2022, ICLR.
[5] Yi Wu,et al. Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination , 2021, AAAI.
[6] Shane Legg,et al. Model-Free Risk-Sensitive Reinforcement Learning , 2021, ArXiv.
[7] Bin Liang,et al. Offline Reinforcement Learning with Value-based Episodic Memory , 2021, ICLR.
[8] Richard Everett,et al. Collaborating with Humans without Human Data , 2021, NeurIPS.
[9] Junhyuk Oh,et al. Pick Your Battles: Interaction Graphs as Population-Level Objectives for Strategic Diversity , 2021, AAMAS.
[10] Qianchuan Zhao,et al. Celebrating Diversity in Shared Multi-Agent Reinforcement Learning , 2021, NeurIPS.
[11] Matthijs T. J. Spaan,et al. WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning , 2021, AAAI.
[12] Yaodong Yang,et al. Modelling Behavioural Diversity for Learning in Open-Ended Games , 2021, ICML.
[13] Javier Ruiz-del-Solar,et al. Learning to Play Soccer From Scratch: Sample-Efficient Emergent Coordination Through Curriculum-Learning and Competition , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[14] Boyuan Chen,et al. Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization , 2021, ICLR.
[15] Yu Wang,et al. The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games , 2021, NeurIPS.
[16] Xinrun Wang,et al. RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents , 2021, NeurIPS.
[17] Matthew E. Taylor,et al. Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems , 2021, AAMAS.
[18] Yan Zheng,et al. Generating Behavior-Diverse Game AIs with Evolutionary Multi-Objective Deep Reinforcement Learning , 2020, IJCAI.
[19] Roy Fox,et al. Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games , 2020, NeurIPS.
[20] Max Jaderberg,et al. Real World Games Look Like Spinning Tops , 2020, NeurIPS.
[21] K. Choromanski,et al. Effective Diversity in Population-Based Reinforcement Learning , 2020, NeurIPS.
[22] Ruslan Salakhutdinov,et al. Worst Cases Policy Gradients , 2019, CoRL.
[23] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[24] Igor Mordatch,et al. Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.
[25] Finale Doshi-Velez,et al. Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies , 2019, IJCAI.
[26] Greg Turk,et al. Learning Novel Policies For Tasks , 2019, ICML.
[27] Kagan Tumer,et al. Collaborative Evolutionary Reinforcement Learning , 2019, ICML.
[28] Marc G. Bellemare,et al. Statistics and Samples in Distributional Reinforcement Learning , 2019, ICML.
[29] Guy Lever,et al. Emergent Coordination Through Competition , 2019, ICLR.
[30] Ang Li,et al. A Generalized Framework for Population Based Training , 2019, KDD.
[31] Max Jaderberg,et al. Open-ended Learning in Symmetric Zero-sum Games , 2019, ICML.
[32] M. Fu,et al. Risk-Sensitive Reinforcement Learning via Policy Gradient Search , 2018, Found. Trends Mach. Learn..
[33] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.
[34] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[35] Zhang-Wei Hong,et al. Diversity-Driven Exploration Strategy for Deep Reinforcement Learning , 2018, NeurIPS.
[36] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[37] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[38] Michael I. Jordan,et al. Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.
[39] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.
[40] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.
[41] P. Abbeel,et al. Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.
[42] Jakub W. Pachocki,et al. Emergent Complexity via Multi-Agent Competition , 2017, ICLR.
[43] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[44] Frank Hutter,et al. CMA-ES for Hyperparameter Optimization of Deep Neural Networks , 2016, ArXiv.
[45] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[46] David Silver,et al. Fictitious Self-Play in Extensive-Form Games , 2015, ICML.
[47] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[48] Michael C. Fu,et al. Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control , 2015, ICML.
[49] Mohammad Ghavamzadeh,et al. Algorithms for CVaR Optimization in MDPs , 2014, NIPS.
[50] Avrim Blum,et al. Planning in the Presence of Cost Functions Controlled by an Adversary , 2003, ICML.
[51] A. Müller. Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.
[52] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[53] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[54] A. Tversky,et al. Advances in prospect theory: Cumulative representation of uncertainty , 1992 .
[55] W. Newey,et al. Asymmetric Least Squares Estimation and Testing , 1987 .
[56] Yaodong Yang,et al. A Unified Diversity Measure for Multiagent Reinforcement Learning , 2022, NeurIPS.
[57] Rousslan Fernand Julien Dossa,et al. CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms , 2022, J. Mach. Learn. Res..
[58] Yaodong Yang,et al. Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games , 2021, NeurIPS.
[59] Hengyuan Hu,et al. Trajectory Diversity for Zero-Shot Coordination , 2021, AAMAS.
[60] R. Rockafellar,et al. Optimization of conditional value-at risk , 2000 .