暂无分享,去创建一个
Doina Precup | Michael L. Littman | Mark K. Ho | David Abel | Anna Harutyunyan | Will Dabney | Satinder Singh | Doina Precup | M. Littman | Will Dabney | Satinder Singh | A. Harutyunyan | David Abel
[1] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.
[2] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[3] Michael H. Bowling,et al. Apprenticeship learning using linear programming , 2008, ICML '08.
[4] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[5] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[6] Ufuk Topcu,et al. Environment-Independent Task Specifications via GLTL , 2017, ArXiv.
[7] Johannes Fürnkranz,et al. A Survey of Preference-Based Reinforcement Learning Methods , 2017, J. Mach. Learn. Res..
[8] Jonathan Uesato,et al. REALab: An Embedded Perspective on Tampering , 2020, ArXiv.
[9] Smaranda Muresan,et al. Grounding English Commands to Reward Functions , 2015, Robotics: Science and Systems.
[10] David H. Ackley,et al. Interactions between learning and evolution , 1991 .
[11] Narendra Karmarkar,et al. A new polynomial-time algorithm for linear programming , 1984, Comb..
[12] G. Debreu. Mathematical Economics: Representation of a preference ordering by a numerical function , 1983 .
[13] Richard L. Lewis,et al. Where Do Rewards Come From , 2009 .
[14] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[15] L. G. Mitten. Preference Order Dynamic Programming , 1974 .
[16] Martha White,et al. Unifying Task Specification in Reinforcement Learning , 2016, ICML.
[17] Peter Stone,et al. Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.
[18] David M. Kreps. Notes On The Theory Of Choice , 1988 .
[19] Alan Fern,et al. A Bayesian Approach for Policy Learning from Trajectory Preference Queries , 2012, NIPS.
[20] Richard L. Lewis,et al. Reward Design via Online Gradient Ascent , 2010, NIPS.
[21] Sheila A. McIlraith,et al. Teaching Multiple Tasks to an RL Agent using LTL , 2018, AAMAS.
[22] Karl J. Friston,et al. Reinforcement Learning or Active Inference? , 2009, PloS one.
[23] Satinder Singh Baveja,et al. The Optimal Reward Problem: Designing Effective Reward for Bounded Agents , 2011 .
[24] Stefanie Tellex,et al. Learning to Parse Natural Language to Grounded Reward Functions with Weak Supervision , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[25] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[26] Ruosong Wang,et al. Preference-based Reinforcement Learning with Finite-Time Guarantees , 2020, NeurIPS.
[27] Matthew J. Sobel,et al. Discounting axioms imply risk neutrality , 2012, Annals of Operations Research.
[28] Geraud Nangue Tasse,et al. A Boolean Task Algebra for Reinforcement Learning , 2020, NeurIPS.
[29] Karl J. Friston. The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.
[30] Paul Weng,et al. Markov Decision Processes with Ordinal Rewards: Reference Point-Based Preferences , 2011, ICAPS.
[31] Silviu Pitis,et al. Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach , 2019, AAAI.
[32] David L. Roberts,et al. Convergent Actor Critic by Humans , 2016 .
[33] Sheila A. McIlraith,et al. Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning , 2018, ICML.
[34] Matthew E. Taylor,et al. Maximum Reward Formulation In Reinforcement Learning , 2020, ArXiv.
[35] Guan Wang,et al. Interactive Learning from Policy-Dependent Human Feedback , 2017, ICML.
[36] Michael Wooldridge,et al. Multi-Agent Reinforcement Learning with Temporal Logic Specifications , 2021, AAMAS.
[37] Karl J. Friston,et al. Action and Perception as Divergence Minimization , 2020, ArXiv.
[38] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[39] Stuart J. Russell,et al. Benefits of Assistance over Reward Learning , 2020 .
[40] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[41] T. Koopmans. Stationary Ordinal Utility and Impatience , 1960 .
[42] Peter A. Streufert. Ordinal Dynamic Programming , 1991 .
[43] Anca D. Dragan,et al. Reward-rational (implicit) choice: A unifying formalism for reward learning , 2020, NeurIPS.
[44] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[45] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[46] Calin Belta,et al. Reinforcement learning with temporal logic rewards , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[47] Anca D. Dragan,et al. The Off-Switch Game , 2016, IJCAI.
[48] Anca D. Dragan,et al. Cooperative Inverse Reinforcement Learning , 2016, NIPS.
[49] Junhyuk Oh,et al. What Can Learned Intrinsic Rewards Capture? , 2019, ICML.
[50] Daniel Dewey,et al. Reinforcement Learning and the Reward Engineering Principle , 2014, AAAI Spring Symposia.
[51] Joel W. Burdick,et al. Dueling Posterior Sampling for Preference-Based Reinforcement Learning , 2019, UAI.
[52] Nan Jiang,et al. Repeated Inverse Reinforcement Learning , 2017, NIPS.
[53] A. Barto,et al. On Separating Agent Designer Goals from Agent Goals : Breaking the Preferences – Parameters Confound , 2010 .
[54] Michael Matthews,et al. The Alignment Problem: Machine Learning and Human Values , 2022, Personnel Psychology.
[55] Marcus Hutter,et al. Axioms for Rational Reinforcement Learning , 2011, ALT.
[56] Yoshua Bengio,et al. Hyperbolic Discounting and Learning over Multiple Horizons , 2019, ArXiv.
[57] Rajeev Alur,et al. A Composable Specification Language for Reinforcement Learning Tasks , 2020, NeurIPS.
[58] Laurent Orseau,et al. Reinforcement Learning with a Corrupted Reward Channel , 2017, IJCAI.
[59] Nathalie Bertrand,et al. The Steady-State Control Problem for Markov Decision Processes , 2013, QEST.
[60] Anca D. Dragan,et al. Inverse Reward Design , 2017, NIPS.
[61] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.
[62] Doina Precup,et al. Reward is enough , 2021, Artif. Intell..