An Axiomatic Approach to Rationality for Reinforcement Learning Agents
暂无分享,去创建一个
[1] M. Botvinick,et al. The successor representation in human reinforcement learning , 2016, bioRxiv.
[2] Silviu Pitis,et al. Source Traces for Temporal Difference Learning , 2018, AAAI.
[3] John G. Kemeny,et al. Finite Markov chains , 1960 .
[4] M. J. Sobel. Ordinal Dynamic Programming , 1975 .
[5] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[6] T. Koopmans,et al. Two papers on the representation of preference orderings : representation of preference orderings with independent components of consumption, and, Representation of preference orderings over time , 1972 .
[7] David M. Kreps. Notes On The Theory Of Choice , 1988 .
[8] J. Rawls. A Theory of Justice , 1999 .
[9] Shanefrederick,et al. Time Discounting and Time Preference : A Critical Review , 2022 .
[10] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[11] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[12] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.
[13] Stuart Armstrong,et al. Impossibility of deducing preferences and rationality from human policy , 2017, NIPS 2018.
[14] Evan L. Porteus. On the Optimality of Structured Policies in Countable Stage Decision Processes , 1975 .
[15] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[16] Larry G. Epstein. Stationary cardinal utility and optimal growth under uncertainty , 1983 .
[17] Stuart J. Russell. Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.
[18] M. Machina. Dynamic Consistency and Non-expected Utility Models of Choice under Uncertainty , 1989 .
[19] Evan L. Porteus,et al. Temporal Resolution of Uncertainty and Dynamic Choice Theory , 1978 .
[20] David M. Kreps. Decision Problems with Expected Utility Critera, I: Upper and Lower Convergent Utility , 1977, Math. Oper. Res..
[21] S. C. Jaquette. A Utility Criterion for Markov Decision Processes , 1976 .
[22] Andrea Lockerd Thomaz,et al. Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.
[23] T. Koopmans. Stationary Ordinal Utility and Impatience , 1960 .
[24] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.
[25] Matthew J. Sobel,et al. Discounting axioms imply risk neutrality , 2012, Annals of Operations Research.
[26] Johannes Fürnkranz,et al. A Survey of Preference-Based Reinforcement Learning Methods , 2017, J. Mach. Learn. Res..
[27] Stuart J. Russell,et al. Rationality and Intelligence: A Brief Update , 2013, PT-AI.
[28] J. Neumann,et al. Theory of games and economic behavior , 1945, 100 Years of Math Milestones.
[29] Evan L. Porteus,et al. Dynamic Choice Theory and Dynamic Programming , 1979 .
[30] P. Diamond. The Evaluation of Infinite Utility Streams , 1965 .
[31] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[32] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.
[33] Alan Fern,et al. A Bayesian Approach for Policy Learning from Trajectory Preference Queries , 2012, NIPS.
[34] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[35] Michèle Sebag,et al. Preference-Based Policy Learning , 2011, ECML/PKDD.
[36] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[37] R. Bellman. A Markovian Decision Process , 1957 .
[38] A. Tversky,et al. Rational choice and the framing of decisions , 1990 .