暂无分享,去创建一个
David Mguni | Yaodong Yang | Yali Du | Taher Jafferjee | Nicolas Perez Nieves | Jun Wang | Hui Chen | Feifei Tong | Jiangcheng Zhu | Jianhong Wang | Wenbin Song
[1] Yujing Hu,et al. Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping , 2020, NeurIPS.
[2] Xiaotie Deng,et al. Settling the complexity of computing two-player Nash equilibria , 2007, JACM.
[3] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[4] Dimitri P. Bertsekas,et al. Approximate Dynamic Programming , 2017, Encyclopedia of Machine Learning and Data Mining.
[5] Sam Devlin,et al. Theoretical considerations of potential-based reward shaping for multi-agent systems , 2011, AAMAS.
[6] Sam Devlin,et al. Policy invariance under reward transformations for multi-objective reinforcement learning , 2017, Neurocomputing.
[7] Giovanni Montana,et al. PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals , 2020, NeurIPS.
[8] Richard Socher,et al. Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards , 2019, NeurIPS.
[9] O. J. Vrieze,et al. On stochastic games with additive reward and transition structure , 1985 .
[10] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[11] S. Shreve,et al. Stochastic differential equations , 1955, Mathematical Proceedings of the Cambridge Philosophical Society.
[12] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..
[13] Dongbin Zhao,et al. A Survey of Deep Reinforcement Learning in Video Games , 2019, ArXiv.
[14] David Mguni,et al. Cutting Your Losses: Learning Fault-Tolerant Control and Optimal Stopping under Adverse Risk , 2019, ArXiv.
[15] Jimmy Ba,et al. Learning Intrinsic Rewards as a Bi-Level Optimization Problem , 2020, UAI.
[16] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[17] Weinan Zhang,et al. Bi-level Actor-Critic for Multi-agent Coordination , 2020, AAAI.
[18] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[19] Dong Yan,et al. Reward Shaping via Meta-Learning , 2019, ArXiv.
[20] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[21] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[22] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[23] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[24] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[25] Shimon Whiteson,et al. The Impact of Non-stationarity on Generalisation in Deep Reinforcement Learning , 2020, ArXiv.
[26] Jane X. Wang,et al. Reinforcement Learning, Fast and Slow , 2019, Trends in Cognitive Sciences.
[27] Julia Donaldson,et al. The big match , 2008 .
[28] Sam Devlin,et al. An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems , 2011, Adv. Complex Syst..
[29] David C. Noelle,et al. Unsupervised Methods For Subgoal Discovery During Intrinsic Motivation in Model-Free Hierarchical Reinforcement Learning , 2019, KEG@AAAI.
[30] Michael L. Littman,et al. Cyclic Equilibria in Markov Games , 2005, NIPS.
[31] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[32] Sergio Valcarcel Macua,et al. Coordinating the Crowd: Inducing Desirable Equilibria in Non-Cooperative Systems , 2019, AAMAS.
[33] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.
[34] Erhan Bayraktar,et al. On the One-Dimensional Optimal Switching Problem , 2007, Math. Oper. Res..
[35] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[36] Carmine Maria Pappalardo,et al. A Parametric Study of a Deep Reinforcement Learning Control System Applied to the Swing-Up Problem of the Cart-Pole , 2020, Applied Sciences.
[37] Sam Devlin,et al. Dynamic potential-based reward shaping , 2012, AAMAS.
[38] Bernhard Schölkopf,et al. Photorealistic Video Super Resolution , 2018, ArXiv.
[39] B. Stengel,et al. COMPUTING EQUILIBRIA FOR TWO-PERSON GAMES , 1996 .
[40] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[41] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[42] H. Young,et al. Handbook of Game Theory with Economic Applications , 2015 .
[43] Sonia Chernova,et al. Reinforcement Learning from Demonstration through Shaping , 2015, IJCAI.
[44] Carl E. Rasmussen,et al. Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.
[45] Yaodong Yang,et al. An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective , 2020, ArXiv.
[46] Yoav Shoham,et al. Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .
[47] Yaodong Yang,et al. Modelling Behavioural Diversity for Learning in Open-Ended Games , 2021, ICML.
[48] Ying Wen,et al. Learning in Nonzero-Sum Stochastic Games with Potentials , 2021, ICML.
[49] D. Mguni,et al. A Viscosity Approach to Stochastic Differential Games of Control and Stopping Involving Impulsive Control , 2018, 1803.11432.
[50] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[51] Tamer Basar,et al. Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.
[52] Peng Peng,et al. Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.
[53] Sam Devlin,et al. Expressing Arbitrary Reward Functions as Potential-Based Advice , 2015, AAAI.
[54] Matthew E. Taylor,et al. Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems , 2021, AAMAS.
[55] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[56] Traian Rebedea,et al. Playing Atari Games with Deep Reinforcement Learning and Human Checkpoint Replay , 2016, ArXiv.
[57] Yaodong Yang,et al. Multi-Agent Determinantal Q-Learning , 2020, ICML.