Off-Policy Imitation Learning from Observations
暂无分享,去创建一个
Bo Dai | Kaixiang Lin | Jiayu Zhou | Zhuangdi Zhu | Jiayu Zhou | Bo Dai | Zhuangdi Zhu | Kaixiang Lin
[1] Youyong Kong,et al. Deep Direct Reinforcement Learning for Financial Signal Representation and Trading , 2017, IEEE Transactions on Neural Networks and Learning Systems.
[2] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.
[3] Martin J. Wainwright,et al. Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.
[4] Huang Xiao,et al. Wasserstein Adversarial Imitation Learning , 2019, ArXiv.
[5] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[6] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[7] Prabhat Nagarajan,et al. Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations , 2019, ICML.
[8] Robert E. Schapire,et al. A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.
[9] Sergey Levine,et al. Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow , 2018, ICLR.
[10] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[11] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[12] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[13] Peter Stone,et al. Generative Adversarial Imitation from Observation , 2018, ArXiv.
[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[15] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[16] Dean Pomerleau,et al. Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.
[17] Yannick Schroecker,et al. Imitating Latent Policies from Observation , 2018, ICML.
[18] Peter Stone,et al. Behavioral Cloning from Observation , 2018, IJCAI.
[19] Ilya Kostrikov,et al. Imitation Learning via Off-Policy Distribution Matching , 2019, ICLR.
[20] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[21] Byron Boots,et al. Provably Efficient Imitation Learning from Observation Alone , 2019, ICML.
[22] Peter Stone,et al. Recent Advances in Imitation Learning from Observation , 2019, IJCAI.
[23] Hao Su,et al. State Alignment-based Imitation Learning , 2019, ICLR.
[24] Ashutosh Saxena,et al. High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.
[25] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[26] S. Varadhan,et al. Asymptotic evaluation of certain Markov process expectations for large time , 1975 .
[27] Etienne Perot,et al. Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.
[28] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.
[29] Fuchun Sun,et al. Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement , 2019, NeurIPS.
[30] Tetsuya Yohira,et al. Sample Efficient Imitation Learning for Continuous Control , 2018, ICLR.
[31] Sergey Levine,et al. Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[32] Peter Stone,et al. Adversarial Imitation Learning from State-only Demonstrations , 2019, AAMAS.
[33] Richard S. Zemel,et al. Understanding the Relation Between Maximum-Entropy Inverse Reinforcement Learning and Behaviour Cloning , 2019, DGS@ICLR.
[34] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[35] Pieter Abbeel,et al. Third-Person Imitation Learning , 2017, ICLR.
[36] Nando de Freitas,et al. Playing hard exploration games by watching YouTube , 2018, NeurIPS.
[37] Ilya Kostrikov,et al. AlgaeDICE: Policy Gradient from Arbitrary Experience , 2019, ArXiv.
[38] Sebastian Nowozin,et al. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.
[39] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[40] Bo Dai,et al. GenDICE: Generalized Offline Estimation of Stationary Values , 2020, ICLR.
[41] Tanmay Gangwani,et al. State-only Imitation with Transition Dynamics Mismatch , 2020, ICLR.
[42] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.