暂无分享,去创建一个
[1] Andrea Lockerd Thomaz,et al. Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.
[2] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.
[3] S. C. Jaquette. A Utility Criterion for Markov Decision Processes , 1976 .
[4] A. Tversky,et al. Rational choice and the framing of decisions , 1990 .
[5] Martha White,et al. Unifying Task Specification in Reinforcement Learning , 2016, ICML.
[6] Stuart Armstrong,et al. Occam's razor is insufficient to infer the preferences of irrational agents , 2017, NeurIPS.
[7] T. Koopmans. Stationary Ordinal Utility and Impatience , 1960 .
[8] K. Vind,et al. Preferences over time , 2003 .
[9] P. Diamond. The Evaluation of Infinite Utility Streams , 1965 .
[10] Evan L. Porteus,et al. Temporal Resolution of Uncertainty and Dynamic Choice Theory , 1978 .
[11] David M. Kreps. Decision Problems with Expected Utility Critera, I: Upper and Lower Convergent Utility , 1977, Math. Oper. Res..
[12] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[13] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[14] M. Botvinick,et al. The successor representation in human reinforcement learning , 2016, Nature Human Behaviour.
[15] T. Koopmans,et al. Two papers on the representation of preference orderings : representation of preference orderings with independent components of consumption, and, Representation of preference orderings over time , 1972 .
[16] David M. Kreps. Notes On The Theory Of Choice , 1988 .
[17] G. Loewenstein,et al. Time Discounting and Time Preference: A Critical Review , 2002 .
[18] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.
[19] Silviu Pitis,et al. Source Traces for Temporal Difference Learning , 2018, AAAI.
[20] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[21] Evan L. Porteus. On the Optimality of Structured Policies in Countable Stage Decision Processes , 1975 .
[22] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[23] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[24] R. Bellman. A Markovian Decision Process , 1957 .
[25] Larry G. Epstein. Stationary cardinal utility and optimal growth under uncertainty , 1983 .
[26] M. Machina. Dynamic Consistency and Non-expected Utility Models of Choice under Uncertainty , 1989 .
[27] John C. Harsanyi,et al. Cardinal Utility in Welfare Economics and in the Theory of Risk-taking , 1953, Journal of Political Economy.
[28] J. Rawls. A Theory of Justice , 1999 .
[29] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[30] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[31] Alan Fern,et al. A Bayesian Approach for Policy Learning from Trajectory Preference Queries , 2012, NIPS.
[32] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[33] Michèle Sebag,et al. Preference-Based Policy Learning , 2011, ECML/PKDD.
[34] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.
[35] Matthew J. Sobel,et al. Discounting axioms imply risk neutrality , 2012, Annals of Operations Research.
[36] Johannes Fürnkranz,et al. A Survey of Preference-Based Reinforcement Learning Methods , 2017, J. Mach. Learn. Res..
[37] Stuart J. Russell,et al. Rationality and Intelligence: A Brief Update , 2013, PT-AI.
[38] B. Nordstrom. FINITE MARKOV CHAINS , 2005 .
[39] Peter A. Streufert. Ordinal Dynamic Programming , 1991 .