暂无分享,去创建一个
[1] Thomas Hofmann,et al. TrueSkill™: A Bayesian Skill Rating System , 2007 .
[2] Alessandro Lazaric,et al. Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits , 2011, ALT.
[3] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[4] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[5] Ian R. Fasel,et al. Design Principles for Creating Human-Shapable Agents , 2009, AAAI Spring Symposium: Agents that Learn from Human Teachers.
[6] Eugene Santos,et al. Explaining Reward Functions in Markov Decision Processes , 2019, FLAIRS.
[7] H. Gulliksen. A least squares solution for paired comparisons with incomplete data , 1956 .
[8] Michèle Sebag,et al. APRIL: Active Preference-learning based Reinforcement Learning , 2012, ECML/PKDD.
[9] Daniel Kahneman,et al. Evaluation by Moments: Past and Future , 2002 .
[10] F. Mosteller. Remarks on the method of paired comparisons: I. The least squares solution assuming equal standard deviations and equal correlations , 1951 .
[11] László Csató,et al. A graph interpretation of the least squares ranking method , 2015, Soc. Choice Welf..
[12] Jonathan Lawry,et al. TripleTree: A Versatile Interpretable Representation of Black Box Agents and their Environments , 2020, AAAI.
[13] D. Hunter. MM algorithms for generalized Bradley-Terry models , 2003 .
[14] Johannes Fürnkranz,et al. A Survey of Preference-Based Reinforcement Learning Methods , 2017, J. Mach. Learn. Res..
[15] Wei-Yin Loh,et al. Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..
[16] Takeo Igarashi,et al. A Survey on Interactive Reinforcement Learning: Design Principles and Open Challenges , 2020, Conference on Designing Interactive Systems.
[17] Richard L. Lewis,et al. Where Do Rewards Come From , 2009 .
[18] Daniel Dewey,et al. Reinforcement Learning and the Reward Engineering Principle , 2014, AAAI Spring Symposia.
[19] R. A. Bradley,et al. RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .
[20] L. Thurstone. A law of comparative judgment. , 1994 .
[21] Jude W. Shavlik,et al. Creating Advice-Taking Reinforcement Learners , 1998, Machine Learning.
[22] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[23] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[24] Devavrat Shah,et al. Iterative ranking from pair-wise comparisons , 2012, NIPS.
[25] Alan Fern,et al. Explainable Reinforcement Learning via Reward Decomposition , 2019 .
[26] Dorsa Sadigh,et al. APReL: A Library for Active Preference-based Reward Learning Algorithms , 2021, 2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[27] Stratis Ioannidis,et al. Experimental Design under the Bradley-Terry Model , 2018, IJCAI.
[28] Shane Legg,et al. Reward learning from human preferences and demonstrations in Atari , 2018, NeurIPS.
[29] M. de Rijke,et al. Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem , 2013, ICML.
[30] Stuart J. Russell,et al. Understanding Learned Reward Functions , 2020, ArXiv.
[31] Anca D. Dragan,et al. Inverse Reward Design , 2017, NIPS.
[32] A. Tversky,et al. Judgment under Uncertainty: Heuristics and Biases , 1974, Science.