暂无分享,去创建一个
[1] Doina Precup,et al. The Termination Critic , 2019, AISTATS.
[2] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[3] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[4] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[5] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[6] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[7] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[8] Andrew G. Barto,et al. Causal Graph Based Decomposition of Factored MDPs , 2006, J. Mach. Learn. Res..
[9] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[10] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[11] Shie Mannor,et al. Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations , 2014, ICML.
[12] Doina Precup,et al. Safe option-critic: learning safety in the option-critic architecture , 2018, The Knowledge Engineering Review.
[13] Andrew G. Barto,et al. Intrinsically Motivated Hierarchical Skill Learning in Structured Environments , 2010, IEEE Transactions on Autonomous Mental Development.
[14] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[15] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[16] Andrew G. Barto,et al. Skill Characterization Based on Betweenness , 2008, NIPS.
[17] Marlos C. Machado,et al. Exploration in Reinforcement Learning with Deep Covering Options , 2020, ICLR.
[18] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[19] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.
[20] Shakir Mohamed,et al. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.
[21] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[22] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.
[23] Andrew G. Barto,et al. Behavioral Hierarchy: Exploration and Representation , 2013, Computational and Robotic Models of the Hierarchical Organization of Behavior.
[24] Chrystopher L. Nehaniv,et al. All Else Being Equal Be Empowered , 2005, ECAL.
[25] Andrew G. Barto,et al. Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.
[26] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[27] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[28] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[29] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.
[30] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[31] Lihong Li,et al. PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.
[32] George Konidaris,et al. Discovering Options for Exploration by Minimizing Cover Time , 2019, ICML.
[33] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[34] Doina Precup,et al. Options of Interest: Temporal Abstraction with Interest Functions , 2020, AAAI.
[35] Pieter Abbeel,et al. Efficient Online Estimation of Empowerment for Reinforcement Learning , 2020, ArXiv.
[36] Christoph Salge,et al. Approximation of Empowerment in the continuous Domain , 2013, Adv. Complex Syst..
[37] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[38] Doina Precup,et al. When Waiting is not an Option : Learning Options with a Deliberation Cost , 2017, AAAI.
[39] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[40] Sergey Levine,et al. Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.
[41] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[42] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[43] Alessandro Lazaric,et al. Exploration – Exploitation in MDPs with Options , 2016 .
[44] Doina Precup,et al. Learnings Options End-to-End for Continuous Action Tasks , 2017, ArXiv.