暂无分享,去创建一个
[1] Yan Wu,et al. Optimizing agent behavior over long time scales by transporting value , 2018, Nature Communications.
[2] Qiang Liu,et al. Learning Self-Imitating Diverse Policies , 2018, ICLR.
[3] Sae-Young Chung,et al. Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update , 2018, NeurIPS.
[4] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[5] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[6] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[7] Jen Jen Chung,et al. D++: Structural credit assignment in tightly coupled multiagent domains , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[8] Sepp Hochreiter,et al. RUDDER: Return Decomposition for Delayed Rewards , 2018, NeurIPS.
[9] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .
[10] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[11] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[12] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[14] Demis Hassabis,et al. Neural Episodic Control , 2017, ICML.
[15] Thomas Blaschke,et al. Molecular de-novo design through deep reinforcement learning , 2017, Journal of Cheminformatics.
[16] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[17] Xi Chen,et al. Sequence Modeling of Temporal Credit Assignment for Episodic Reinforcement Learning , 2019, ArXiv.
[18] Jürgen Schmidhuber,et al. Artificial curiosity based on discovering novel algorithmic predictability through coevolution , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).
[19] Richard L. Lewis,et al. Internal Rewards Mitigate Agent Boundedness , 2010, ICML.
[20] Richard L. Lewis,et al. Where Do Rewards Come From , 2009 .
[21] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[22] Nobuaki Minematsu,et al. A Study on Invariance of $f$-Divergence and Its Application to Speech Recognition , 2010, IEEE Transactions on Signal Processing.
[23] Thomas A. Runkler,et al. A benchmark environment motivated by industrial control problems , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).
[24] Kenneth O. Stanley,et al. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.
[25] Lih-Yuan Deng,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.
[26] Marvin Minsky,et al. Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.
[27] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[28] Doina Precup,et al. Hindsight Credit Assignment , 2019, NeurIPS.
[29] Hazhir Rahmandad,et al. Effects of feedback delay on learning , 2009 .
[30] Richard L. Lewis,et al. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.
[31] Satinder Singh,et al. Generative Adversarial Self-Imitation Learning , 2018, ArXiv.
[32] Richard L. Lewis,et al. Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents , 2011, AAAI.
[33] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[34] Tao Chen,et al. Hardware Conditioned Policies for Multi-Robot Transfer Learning , 2018, NeurIPS.
[35] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[36] Alexander J. Smola,et al. Deep Sets , 2017, 1703.06114.
[37] Shie Mannor,et al. The Cross Entropy Method for Fast Policy Search , 2003, ICML.