暂无分享,去创建一个
[1] John N. Tsitsiklis,et al. Mean-Variance Optimization in Markov Decision Processes , 2011, ICML.
[2] M. Woodroofe. A central limit theorem for functions of a Markov chain with applications to shifts , 1992 .
[3] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[4] E. Altman. Constrained Markov Decision Processes , 1999 .
[5] Alexander Shapiro,et al. Optimization of Convex Risk Functions , 2006, Math. Oper. Res..
[6] Masashi Sugiyama,et al. Nonparametric Return Distribution Approximation for Reinforcement Learning , 2010, ICML.
[7] Jia Yuan Yu,et al. Effect of Reward Function Choices in MDPs with Value-at-Risk , 2016, 1612.02088.
[8] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[9] D. White. Mean, variance, and probabilistic criteria in finite Markov decision processes: A review , 1988 .
[10] Philippe Artzner,et al. Coherent Measures of Risk , 1999 .
[11] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..
[12] D. Krass,et al. Percentile performance criteria for limiting average Markov decision processes , 1995, IEEE Trans. Autom. Control..
[13] Geoffrey S. Watson,et al. Distribution Theory for Tests Based on the Sample Distribution Function , 1973 .
[14] Galin L. Jones. On the Markov chain central limit theorem , 2004, math/0409112.
[15] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[16] J. Hess,et al. Analysis of variance , 2018, Transfusion.
[17] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .
[18] Michael C. Fu,et al. Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control , 2015, ICML.
[19] Matthew J. Sobel,et al. Mean-Variance Tradeoffs in an Undiscounted MDP , 1994, Oper. Res..
[20] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.