暂无分享,去创建一个
[1] Yoshua Bengio,et al. Hyperbolic Discounting and Learning over Multiple Horizons , 2019, ArXiv.
[2] Jun Tan,et al. Stabilizing Reinforcement Learning in Dynamic Environment with Application to Online Recommendation , 2018, KDD.
[3] Olexandr Isayev,et al. Deep reinforcement learning for de novo drug design , 2017, Science Advances.
[4] Konstantinos V. Katsikopoulos,et al. Markov decision processes with delays and asynchronous cost collection , 2003, IEEE Trans. Autom. Control..
[5] Sepp Hochreiter,et al. RUDDER: Return Decomposition for Delayed Rewards , 2018, NeurIPS.
[6] Yuandong Tian,et al. Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning , 2016, ICLR.
[7] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.
[8] Peter Stone,et al. TEXPLORE: real-time sample-efficient reinforcement learning for robots , 2012, Machine Learning.
[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[10] Xi Chen,et al. Sequence Modeling of Temporal Credit Assignment for Episodic Reinforcement Learning , 2019, ArXiv.
[11] Sam Devlin,et al. An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems , 2011, Adv. Complex Syst..
[12] Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.
[13] Doina Precup,et al. Hindsight Credit Assignment , 2019, NeurIPS.
[14] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.
[15] Marek Petrik,et al. Biasing Approximate Dynamic Programming with a Lower Discount Factor , 2008, NIPS.
[16] D. S. Moore,et al. The Basic Practice of Statistics , 2001 .
[17] Jian Peng,et al. Off-Policy Reinforcement Learning with Delayed Rewards , 2021, ICML.
[18] Jonathan Binas,et al. Reinforcement Learning with Random Delays , 2021, ICLR.
[19] Richard L. Lewis,et al. Reward Design via Online Gradient Ascent , 2010, NIPS.
[20] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[21] Thomas J. Walsh,et al. Learning and planning in environments with delayed feedback , 2009, Autonomous Agents and Multi-Agent Systems.
[22] Hazhir Rahmandad,et al. Effects of feedback delay on learning , 2009 .
[23] Renyuan Xu,et al. Learning in Generalized Linear Contextual Bandits with Stochastic Delays , 2019, NeurIPS.
[24] Sepp Hochreiter,et al. Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER , 2020, Trans. Large Scale Data Knowl. Centered Syst..
[25] P. Schrimpf,et al. Dynamic Programming , 2011 .
[26] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[27] Björn Wittenmark,et al. Stochastic Analysis and Control of Real-time Systems with Random Time Delays , 1999 .
[28] Michael I. Jordan,et al. On the Theory of Reinforcement Learning with Once-per-Episode Feedback , 2021, ArXiv.
[29] Chongjie Zhang,et al. Towards Understanding Cooperative Multi-Agent Q-Learning with Value Factorization , 2020, NeurIPS.
[30] Hang Su,et al. Playing FPS Games With Environment-Aware Hierarchical Reinforcement Learning , 2019, IJCAI.
[31] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[32] Ron Kohavi,et al. Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.
[33] Lei Han,et al. LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning , 2019, NeurIPS.
[34] Marc G. Bellemare,et al. Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.
[35] Hoong Chuin Lau,et al. Credit Assignment For Collective Multiagent RL With Global Rewards , 2018, NeurIPS.
[36] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.
[37] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[38] Richard L. Lewis,et al. Pairwise Weights for Temporal Credit Assignment , 2021, ArXiv.
[39] Honglak Lee,et al. Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games , 2016, IJCAI.
[40] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[41] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.
[42] Greg Wayne,et al. Synthetic Returns for Long-Term Credit Assignment , 2021, ArXiv.
[43] Srikanth Kandula,et al. Resource Management with Deep Reinforcement Learning , 2016, HotNets.
[44] Shie Mannor,et al. Reinforcement Learning with Trajectory Feedback , 2020, ArXiv.
[45] Sepp Hochreiter,et al. Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution , 2020, ArXiv.
[46] Thomas A. Runkler,et al. A benchmark environment motivated by industrial control problems , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).
[47] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[48] Nan Jiang,et al. The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.
[49] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[50] Satinder Singh,et al. Generative Adversarial Self-Imitation Learning , 2018, ArXiv.
[51] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[52] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.
[53] Li Li,et al. Optimization of Molecules via Deep Reinforcement Learning , 2018, Scientific Reports.
[54] Yung Yi,et al. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.
[55] Junhyuk Oh,et al. What Can Learned Intrinsic Rewards Capture? , 2019, ICML.
[56] Daniel Dewey,et al. Reinforcement Learning and the Reward Engineering Principle , 2014, AAAI Spring Symposia.
[57] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[58] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[59] Robert Babuska,et al. Control delay in Reinforcement Learning for real-time dynamic systems: A memoryless approach , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[60] Li Fei-Fei,et al. Distributed Asynchronous Optimization with Unbounded Delays: How Slow Can You Go? , 2018, ICML.
[61] Yang Yu,et al. QPLEX: Duplex Dueling Multi-Agent Q-Learning , 2020, ArXiv.
[62] Yuan Zhou,et al. Learning Guidance Rewards with Trajectory-space Smoothing , 2020, NeurIPS.
[63] Qiang Liu,et al. Learning Self-Imitating Diverse Policies , 2018, ICLR.
[64] Pieter Abbeel,et al. PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training , 2021, ICML.
[65] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[66] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[67] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.
[68] Zhengyuan Zhou,et al. Gradient-free Online Learning in Continuous Games with Delayed Rewards , 2020, International Conference on Machine Learning.