Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction
暂无分享,去创建一个
Byron Boots | Geoffrey J. Gordon | J. Andrew Bagnell | Arun Venkatraman | Wen Sun | J. Bagnell | Arun Venkatraman | Byron Boots | Wen Sun
[1] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[2] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..
[3] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[4] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[5] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[6] A. Phatak,et al. Exploiting the connection between PLS, Lanczos methods and conjugate gradients: alternative proofs of some properties of PLS , 2002 .
[7] Jeff G. Schneider,et al. Covariant policy search , 2003, IJCAI 2003.
[8] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[9] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[10] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[11] J. Andrew Bagnell,et al. Maximum margin planning , 2006, ICML.
[12] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[13] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[14] Michael H. Bowling,et al. Apprenticeship learning using linear programming , 2008, ICML '08.
[15] John Langford,et al. Search-based structured prediction , 2009, Machine Learning.
[16] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.
[17] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[18] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[19] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[20] Yisong Yue,et al. Learning Policies for Contextual Submodular Prediction , 2013, ICML.
[21] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[22] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[23] J. Andrew Bagnell,et al. Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.
[24] Martial Hebert,et al. Visual chunking: A list prediction framework for region-based object detection , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).
[25] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..
[26] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[27] Geoffrey J. Gordon,et al. Predicting Structure in Handwritten Algebra Data From Low Level Features , 2015 .
[28] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[29] John Langford,et al. Learning to Search for Dependencies , 2015, ArXiv.
[30] Martial Hebert,et al. Improving Multi-Step Prediction of Learned Time Series Models , 2015, AAAI.
[31] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[32] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[33] John Langford,et al. Learning to Search Better than Your Teacher , 2015, ICML.
[34] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.
[35] Jianfeng Gao,et al. Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.
[36] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[37] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.
[38] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[39] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[40] Joelle Pineau,et al. An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.
[41] Gireeja Ranade,et al. Adaptive Information Gathering via Imitation Learning , 2017, Robotics: Science and Systems.
[42] Sergey Levine,et al. PLATO: Policy learning using adaptive trajectory optimization , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).