暂无分享,去创建一个
Hao Tian | Hongsheng Zeng | Bo Zhou | Fan Wang | Yunxiang Li | Yunxiang Li | Hao Tian | Bo Zhou | Hongsheng Zeng | Fan Wang
[1] Makoto Sato,et al. Variance-Penalized Reinforcement Learning for Risk-Averse Asset Allocation , 2000, IDEAL.
[2] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[3] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[4] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[5] Yang Gao,et al. Risk Averse Robust Adversarial Reinforcement Learning , 2019, 2019 International Conference on Robotics and Automation (ICRA).
[6] Jan Peters,et al. Bayesian optimization for learning gaits under uncertainty , 2015, Annals of Mathematics and Artificial Intelligence.
[7] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[8] Makoto Sato,et al. TD algorithm for the variance of return and mean-variance reinforcement learning , 2001 .
[9] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[10] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[11] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.
[12] N. E. Toklu,et al. Artificial Intelligence for Prosthetics - challenge solutions , 2019, The NeurIPS '18 Competition.
[13] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[14] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[15] Shuchang Zhou,et al. Learning to Run with Actor-Critic Ensemble , 2017, ArXiv.
[16] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[17] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[18] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[19] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[20] Gabriel Kalweit,et al. Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning , 2017, CoRL.
[21] Ayman Habib,et al. OpenSim: Simulating musculoskeletal dynamics and neuromuscular control to study human and animal movement , 2018, PLoS Comput. Biol..
[22] Elena Smirnova,et al. Distributionally Robust Reinforcement Learning , 2019, ArXiv.
[23] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[24] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[25] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[26] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[27] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[28] Matthias Heger,et al. Consideration of Risk in Reinforcement Learning , 1994, ICML.
[29] Pierre-Yves Oudeyer,et al. What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.
[30] Elie Bienenstock,et al. Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.
[31] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.
[32] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[33] Amrita Saha,et al. Risk Averse Reinforcement Learning for Mixed Multi-agent Environments , 2019, AAMAS.
[34] Sergey Levine,et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.
[35] Chris Gaskett,et al. Reinforcement learning under circumstances beyond its control , 2003 .
[36] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.