Direct Policy Iteration with Demonstrations
暂无分享,去创建一个
[1] Joshua B. Tenenbaum,et al. Nonparametric Bayesian Policy Priors for Reinforcement Learning , 2010, NIPS.
[2] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[3] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[4] K. I. M. McKinnon,et al. On the Generation of Markov Decision Processes , 1995 .
[5] Matthieu Geist,et al. Boosted Bellman Residual Minimization Handling Expert Demonstrations , 2014, ECML/PKDD.
[6] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.
[7] Alessandro Lazaric,et al. Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.
[8] J. Andrew Bagnell,et al. Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.
[9] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[10] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[11] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[12] Joelle Pineau,et al. Learning from Limited Demonstrations , 2013, NIPS.
[13] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.
[14] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[15] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[16] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[17] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[18] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..