暂无分享,去创建一个
[1] Sonia Chernova,et al. Integrating reinforcement learning with human demonstrations of varying ability , 2011, AAMAS.
[2] Zoran Popovic,et al. Discovery of complex behaviors through contact-invariant optimization , 2012, ACM Trans. Graph..
[3] Satinder Singh,et al. Many-Goals Reinforcement Learning , 2018, ArXiv.
[4] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[5] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[6] Jackie Kay,et al. Self-Supervised Sim-to-Real Adaptation for Visual Robotic Manipulation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).
[7] Yuval Tassa,et al. Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.
[8] Sergey Levine,et al. Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.
[9] Andrew W. Moore,et al. Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs , 1999, IJCAI.
[10] Oleg O. Sushkov,et al. Scaling data-driven robotics with reward sketching and batch reinforcement learning , 2019, Robotics: Science and Systems.
[11] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[12] Pieter Abbeel,et al. Planning to Explore via Self-Supervised World Models , 2020, ICML.
[13] Marc Toussaint,et al. Plan-Based Relaxed Reward Shaping for Goal-Directed Tasks , 2021, ICLR.
[14] Sergey Levine,et al. When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.
[15] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[16] Andrea Lockerd Thomaz,et al. Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.
[17] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[18] Tim Salimans,et al. Learning Montezuma's Revenge from a Single Demonstration , 2018, ArXiv.
[19] Tadahiro Taniguchi,et al. Integration of imitation learning using GAIL and reinforcement learning using task-achievement rewards via probabilistic graphical model , 2020, Adv. Robotics.
[20] Scott Kuindersma,et al. Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot , 2015, Autonomous Robots.
[21] Wojciech Zaremba,et al. Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model , 2016, ArXiv.
[22] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[23] Emanuel Todorov,et al. Ensemble-CIO: Full-body dynamic motion planning that transfers to physical humanoids , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[24] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[25] Sergey Levine,et al. Learning Latent Plans from Play , 2019, CoRL.
[26] Sergey Levine,et al. DeepMimic , 2018, ACM Trans. Graph..
[27] Peter Stone,et al. Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.
[28] Lydia Tapia,et al. PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-Based Planning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[29] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[30] Sonia Chernova,et al. Recent Advances in Robot Learning from Demonstration , 2020, Annu. Rev. Control. Robotics Auton. Syst..
[31] Sergey Levine,et al. Planning with Goal-Conditioned Policies , 2019, NeurIPS.
[32] Peter Dayan,et al. Structure in the Space of Value Functions , 2002, Machine Learning.
[33] Sergey Levine,et al. Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[34] Marc Toussaint,et al. Differentiable Physics and Stable Modes for Tool-Use and Manipulation Planning , 2018, Robotics: Science and Systems.
[35] Sergey Levine,et al. Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings , 2018, ICML.
[36] Leslie Pack Kaelbling,et al. Learning to Achieve Goals , 1993, IJCAI.
[37] Honglak Lee,et al. Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards , 2020, NeurIPS.
[38] R. Bellman. A Markovian Decision Process , 1957 .
[39] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.