暂无分享,去创建一个
[1] Jonathan Scholz,et al. Generative predecessor models for sample-efficient imitation learning , 2019, ICLR.
[2] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[3] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.
[4] Yannick Schroecker,et al. State Aware Imitation Learning , 2017, NIPS.
[5] Anca D. Dragan,et al. SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards , 2019, ICLR.
[6] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[7] Philip S. Thomas,et al. Is the Policy Gradient a Gradient? , 2019, AAMAS.
[8] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[9] Anca D. Dragan,et al. SQIL: Imitation Learning via Regularized Behavioral Cloning , 2019, ArXiv.
[10] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[11] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[12] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[13] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[14] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[15] Nando de Freitas,et al. Playing hard exploration games by watching YouTube , 2018, NeurIPS.
[16] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[17] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[18] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.
[19] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[20] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[21] Samy Bengio,et al. Density estimation using Real NVP , 2016, ICLR.
[22] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[23] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[24] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[25] Tianfu Wu,et al. ARCHER: Aggressive Rewards to Counter bias in Hindsight Experience Replay , 2018, ArXiv.
[26] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[27] Yiannis Demiris,et al. Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation , 2019, ICML.
[28] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[29] Sergey Levine,et al. Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.
[30] Sergey Levine,et al. DeepMimic , 2018, ACM Trans. Graph..
[31] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .
[32] Jitendra Malik,et al. Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[33] Sergey Levine,et al. Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.
[34] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[35] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.