暂无分享,去创建一个
[1] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[2] J. Cockcroft. Investment in Science , 1962, Nature.
[3] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .
[4] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[5] Ward Whitt,et al. Approximations of Dynamic Programs, II , 1979, Math. Oper. Res..
[6] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .
[7] M. J. Sobel,et al. Discounted MDP's: distribution functions and exponential utility maximization , 1987 .
[8] P. Whittle. Restless Bandits: Activity Allocation in a Changing World , 1988 .
[9] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[10] E. Altman. Constrained Markov Decision Processes , 1999 .
[11] Philippe Artzner,et al. Coherent Measures of Risk , 1999 .
[12] Mihalis Yannakakis,et al. On the approximability of trade-offs and optimal access of Web sources , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.
[13] Frank Riedel,et al. Dynamic Coherent Risk Measures , 2003 .
[14] L. Ghaoui,et al. Robust markov decision processes with uncertain transition matrices , 2004 .
[15] Sven Koenig,et al. Risk-Sensitive Planning with One-Switch Utility Functions: Value Iteration , 2005, AAAI.
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Garud Iyengar,et al. Robust Dynamic Programming , 2005, Math. Oper. Res..
[18] Sean P. Meyn. Control Techniques for Complex Networks: Workload , 2007 .
[19] J. Tsitsiklis,et al. Robust, risk-sensitive, and data-driven control of markov decision processes , 2007 .
[20] Javier de Frutos,et al. Approximation of Dynamic Programs , 2012 .