Model-Free Preference-Based Reinforcement Learning
暂无分享,去创建一个
Johannes Fürnkranz | Christian Wirth | Gerhard Neumann | G. Neumann | Johannes Fürnkranz | Christian Wirth
[1] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[2] Jan Peters,et al. Hierarchical Relative Entropy Policy Search , 2014, AISTATS.
[3] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[5] Csaba Szepesvari. Least Squares Temporal Difference Learning and Galerkin ’ s Method , 2011 .
[6] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[7] Nikolaus Hansen,et al. Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.
[8] Christopher K. I. Williams,et al. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .
[9] Michèle Sebag,et al. APRIL: Active Preference-learning based Reinforcement Learning , 2012, ECML/PKDD.
[10] Ryan P. Adams,et al. Elliptical slice sampling , 2009, AISTATS.
[11] Christoph H. Lampert,et al. Movement templates for learning of hitting and batting , 2010, 2010 IEEE International Conference on Robotics and Automation.
[12] Johannes Fürnkranz,et al. Preference-Based Reinforcement Learning: A Preliminary Survey , 2013 .
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] Zoubin Ghahramani,et al. Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.
[15] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[16] Alan Fern,et al. A Bayesian Approach for Policy Learning from Trajectory Preference Queries , 2012, NIPS.
[17] Michèle Sebag,et al. Programming by Feedback , 2014, ICML.
[18] Jan Peters,et al. Data-Efficient Generalization of Robot Skills with Contextual Policy Search , 2013, AAAI.
[19] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[20] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[21] Johannes Fürnkranz,et al. EPMC: Every Visit Preference Monte Carlo for Reinforcement Learning , 2013, ACML.
[22] Alborz Geramifard,et al. A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning , 2013, Found. Trends Mach. Learn..
[23] Matthew W. Hoffman,et al. Regularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization , 2011, EWRL.
[24] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .