暂无分享,去创建一个
Yoshua Bengio | Doina Precup | Rola Dali | Jhelum Chakravorty | Junhao Wang | David Venuto | Leonard Boussioux | Yoshua Bengio | Doina Precup | J. Chakravorty | David Venuto | L. Boussioux | Junhao Wang | R. Dali
[1] Karl Johan Åström,et al. Optimal control of Markov processes with incomplete state information , 1965 .
[2] E. Altman. Constrained Markov Decision Processes , 1999 .
[3] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[4] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[5] K. Dautenhahn,et al. Imitation and Social Learning in Robots, Humans and Animals: Behavioural, Social and Communicative Dimensions , 2009 .
[6] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[7] Shie Mannor,et al. Policy Gradients Beyond Expectations: Conditional Value-at-Risk , 2014, ArXiv.
[8] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[9] Mohammad Ghavamzadeh,et al. Algorithms for CVaR Optimization in MDPs , 2014, NIPS.
[10] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[11] Alex Graves,et al. Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.
[12] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[13] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[14] Martin Buss,et al. Understanding Human Avoidance Behavior: Interaction-Aware Decision Making Based on Game Theory , 2016, Int. J. Soc. Robotics.
[15] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[16] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[17] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[18] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[19] T. Robbins,et al. Value generalization in human avoidance learning , 2017, bioRxiv.