暂无分享,去创建一个
Rémi Munos | David Silver | Georg Ostrovski | Will Dabney | D. Silver | Georg Ostrovski | R. Munos | Will Dabney | David Silver
[1] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.
[2] R. Howard,et al. Risk-Sensitive Markov Decision Processes , 1972 .
[3] S. C. Jaquette. Markov Decision Processes with a New Optimality Criterion: Discrete Time , 1973 .
[4] Frederick R. Forst,et al. On robust estimation of the location parameter , 1980 .
[5] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .
[6] M. Yaari. The Dual Theory of Choice under Risk , 1987 .
[7] D. White. Mean, variance, and probabilistic criteria in finite Markov decision processes: A review , 1988 .
[8] A. Tversky,et al. Advances in prospect theory: Cumulative representation of uncertainty , 1992 .
[9] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[10] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[11] Shaun S. Wang. Premium Calculation by Transforming the Layer Premium Density , 1996, ASTIN Bulletin.
[12] Richard Gonzalez,et al. Curvature of the Probability Weighting Function , 1996 .
[13] Daniel Hernández-Hernández,et al. Risk Sensitive Markov Decision Processes , 1997 .
[14] A. Müller. Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.
[15] Richard Gonzalez,et al. On the Shape of the Probability Weighting Function , 1999, Cognitive Psychology.
[16] Shaun S. Wang. A CLASS OF DISTORTION OPERATORS FOR PRICING FINANCIAL AND INSURANCE RISKS , 2000 .
[17] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[18] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[19] Masashi Sugiyama,et al. Nonparametric Return Distribution Approximation for Reinforcement Learning , 2010, ICML.
[20] Matthieu Geist,et al. Kalman Temporal Differences , 2010, J. Artif. Intell. Res..
[21] Masashi Sugiyama,et al. Parametric Return Density Estimation for Reinforcement Learning , 2010, UAI.
[22] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.
[23] Jan Dhaene,et al. Remarks on quantiles and distortion risk measures , 2012 .
[24] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[25] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[26] Mohammad Ghavamzadeh,et al. Algorithms for CVaR Optimization in MDPs , 2014, NIPS.
[27] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[28] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.
[29] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[30] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[31] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[32] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[33] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[34] Kuan-Ting Yu,et al. More than a million ways to be pushed. A high-fidelity experimental dataset of planar pushing , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[35] Yee Whye Teh,et al. Particle Value Functions , 2017, ICLR.
[36] O. Bousquet,et al. From optimal transport to generative modeling: the VEGAN cookbook , 2017, 1705.07642.
[37] Catholijn M. Jonker,et al. Efficient exploration with Double Uncertain Value Networks , 2017, ArXiv.
[38] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.
[39] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[40] Marco Pavone,et al. How Should a Robot Assess Risk? Towards an Axiomatic Theory of Risk in Robotics , 2017, ISRR.
[41] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[42] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[43] Bernhard Schölkopf,et al. Wasserstein Auto-Encoders , 2017, ICLR.
[44] Yee Whye Teh,et al. An Analysis of Categorical Distributional Reinforcement Learning , 2018, AISTATS.
[45] Marc G. Bellemare,et al. The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2017, ICLR.
[46] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[47] Naomi S. Altman,et al. Quantile regression , 2019, Nature Methods.