Preference-Based Reinforcement Learning: A Preliminary Survey

Preference-based reinforcement learning has gained significant popularity over the years, but it is still unclear what exactly preference learning is and how it relates to other reinforcement learning tasks. In this paper, we present a general definition of preferences as well as some insight how these approaches compare to reinforcement learning, inverse reinforcement learning and other related approaches. Additionally, we are offering a coarse categorization of preference-based reinforcement learning algorithms and a preliminary survey based on this allocation.

[1]  Anne Auger,et al.  Convergence results for the (1, lambda)-SA-ES using the theory of phi-irreducible Markov chains , 2005, Theor. Comput. Sci..

[2]  Stuart J. Russell Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.

[3]  Eyke Hüllermeier,et al.  Preference-based reinforcement learning: a formal framework and a policy iteration algorithm , 2012, Mach. Learn..

[4]  Nikolaus Hansen,et al.  Evaluating the CMA Evolution Strategy on Multimodal Test Functions , 2004, PPSN.

[5]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[6]  A. Auger Convergence results for the ( 1 , )-SA-ES using the theory of-irreducible Markov chains , 2005 .

[7]  Christian Igel,et al.  Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search , 2009, ICML '09.

[8]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[9]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[10]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[11]  Alan Fern,et al.  A Bayesian Approach for Policy Learning from Trajectory Preference Queries , 2012, NIPS.

[12]  Michèle Sebag,et al.  Preference-Based Policy Learning , 2011, ECML/PKDD.

[13]  Eyke Hüllermeier,et al.  Preference-based Evolutionary Direct Policy Search , 2013 .

[14]  Eyke Hüllermeier,et al.  Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm , 2014, Machine Learning.

[15]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[16]  Johannes Fürnkranz,et al.  A Policy Iteration Algorithm for Learning from Preference-Based Feedback , 2013, IDA.

[17]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[18]  Eyke Hüllermeier,et al.  Label ranking by learning pairwise preferences , 2008, Artif. Intell..

[19]  Michèle Sebag,et al.  APRIL: Active Preference-learning based Reinforcement Learning , 2012, ECML/PKDD.

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.